# Related Articles

This article proposes a joint model for longitudinal measurements and competing risks survival data. The model consists of a linear mixed effects sub-model with t-distributed measurement errors for the longitudinal outcome, a proportional cause-specific hazards frailty sub-model for the survival outcome, and a regression sub-model for the variance-covariance matrix of the multivariate latent random effects based on a modified Cholesky decomposition. A Bayesian MCMC procedure is developed for parameter estimation and inference. Our method is insensitive to outlying longitudinal measurements in the presence of non-ignorable missing data due to dropout. Moreover, by modeling the variance-covariance matrix of the latent random effects, our model provides a useful framework for handling high-dimensional heterogeneous random effects and testing the homogeneous random effects assumption which is otherwise untestable in commonly used joint models. Finally, our model enables analysis of a survival outcome with intermittently measured time-dependent covariates and possibly correlated competing risks and dependent censoring, as well as joint analysis of the longitudinal and survival outcomes. Illustrations are given using a real data set from a lung study and simulation.

PMCID: PMC3166346
PMID: 21892381

Joint model; Competing risks; Bayesian analysis; Cholesky decomposition; Mixed effects model; MCMC; Modeling random effects covariance matrix; Outlier

In the case of the mixed linear model the random effects are usually assumed to be normally distributed in both the Bayesian and classical frameworks. In this paper, the Dirichlet process prior was used to provide nonparametric Bayesian estimates for correlated random effects. This goal was achieved by providing a Gibbs sampler algorithm that allows these correlated random effects to have a nonparametric prior distribution. A sampling based method is illustrated. This method which is employed by transforming the genetic covariance matrix to an identity matrix so that the random effects are uncorrelated, is an extension of the theory and the results of previous researchers. Also by using Gibbs sampling and data augmentation a simulation procedure was derived for estimating the precision parameter M associated with the Dirichlet process prior. All needed conditional posterior distributions are given. To illustrate the application, data from the Elsenburg Dormer sheep stud were analysed. A total of 3325 weaning weight records from the progeny of 101 sires were used.

doi:10.1186/1297-9686-35-2-137

PMCID: PMC2732692
PMID: 12633530

Bayesian methods; mixed linear model; Dirichlet process prior; correlated random effects; Gibbs sampler

Summary

We study the role of partial autocorrelations in the reparameterization and parsimonious modeling of a covariance matrix. The work is motivated by and tries to mimic the phenomenal success of the partial autocorrelations function (PACF) in model formulation, removing the positive-definiteness constraint on the autocorrelation function of a stationary time series and in reparameterizing the stationarity-invertibility domain of ARMA models. It turns out that once an order is fixed among the variables of a general random vector, then the above properties continue to hold and follows from establishing a one-to-one correspondence between a correlation matrix and its associated matrix of partial autocorrelations. Connections between the latter and the parameters of the modified Cholesky decomposition of a covariance matrix are discussed. Graphical tools similar to partial correlograms for model formulation and various priors based on the partial autocorrelations are proposed. We develop frequentist/Bayesian procedures for modelling correlation matrices, illustrate them using a real dataset, and explore their properties via simulations.

doi:10.1016/j.jmva.2009.04.015

PMCID: PMC2748961
PMID: 20161018

Autoregressive parameters; Cholesky decomposition; Positive-definiteness constraint; Levinson-Durbin algorithm; Prediction variances; Uniform and Reference Priors; Markov Chain Monte Carlo

Summary

We develop a new class of models, dynamic conditionally linear mixed models, for longitudinal data by decomposing the within-subject covariance matrix using a special Cholesky decomposition. Here ‘dynamic’ means using past responses as covariates and ‘conditional linearity’ means that parameters entering the model linearly may be random, but nonlinear parameters are nonrandom. This setup offers several advantages and is surprisingly similar to models obtained from the first-order linearization method applied to nonlinear mixed models. First, it allows for flexible and computationally tractable models that include a wide array of covariance structures; these structures may depend on covariates and hence may differ across subjects. This class of models includes, e.g., all standard linear mixed models, antedependence models, and Vonesh–Carter models. Second, it guarantees the fitted marginal covariance matrix of the data is positive definite. We develop methods for Bayesian inference and motivate the usefulness of these models using a series of longitudinal depression studies for which the features of these new models are well suited.

PMCID: PMC2755537
PMID: 11890319

Covariance matrix; Heterogeneity; Hierarchical models; Markov chain Monte Carlo; Missing data; Unconstrained parameterization

This article studies a general joint model for longitudinal measurements and competing risks survival data. The model consists of a linear mixed effects sub-model for the longitudinal outcome, a proportional cause-specific hazards frailty sub-model for the competing risks survival data, and a regression sub-model for the variance–covariance matrix of the multivariate latent random effects based on a modified Cholesky decomposition. The model provides a useful approach to adjust for non-ignorable missing data due to dropout for the longitudinal outcome, enables analysis of the survival outcome with informative censoring and intermittently measured time-dependent covariates, as well as joint analysis of the longitudinal and survival outcomes. Unlike previously studied joint models, our model allows for heterogeneous random covariance matrices. It also offers a framework to assess the homogeneous covariance assumption of existing joint models. A Bayesian MCMC procedure is developed for parameter estimation and inference. Its performances and frequentist properties are investigated using simulations. A real data example is used to illustrate the usefulness of the approach.

doi:10.1007/s10985-010-9169-6

PMCID: PMC3162577
PMID: 20549344

Cause-specific hazard; Bayesian analysis; Cholesky decomposition; Mixed effects model; MCMC; Modeling covariance matrices

Summary

Many parameters and positive-definiteness are two major obstacles in estimating and modelling a correlation matrix for longitudinal data. In addition, when longitudinal data is incomplete, incorrectly modelling the correlation matrix often results in bias in estimating mean regression parameters. In this paper, we introduce a flexible and parsimonious class of regression models for a covariance matrix parameterized using marginal variances and partial autocorrelations. The partial autocorrelations can freely vary in the interval (–1, 1) while maintaining positive definiteness of the correlation matrix so the regression parameters in these models will have no constraints. We propose a class of priors for the regression coefficients and examine the importance of correctly modeling the correlation structure on estimation of longitudinal (mean) trajectories and the performance of the DIC in choosing the correct correlation model via simulations. The regression approach is illustrated on data from a longitudinal clinical trial.

doi:10.1016/j.jmva.2012.11.010

PMCID: PMC3640593
PMID: 23645941

Markov Chain Monte Carlo; Generalized linear model; Uniform prior

Longitudinal data are routinely collected in biomedical research studies. A natural model describing longitudinal data decomposes an individual’s outcome as the sum of a population mean function and random subject-specific deviations. When parametric assumptions are too restrictive, methods modeling the population mean function and the random subject-specific functions nonparametrically are in demand. In some applications, it is desirable to estimate a covariance function of random subject-specific deviations. In this work, flexible yet computationally efficient methods are developed for a general class of semiparametric mixed effects models, where the functional forms of the population mean and the subject-specific curves are unspecified. We estimate nonparametric components of the model by penalized spline (P-spline, [1]), and reparametrize the random curve covariance function by a modified Cholesky decomposition [2] which allows for unconstrained estimation of a positive semidefinite matrix. To provide smooth estimates, we penalize roughness of fitted curves and derive closed form solutions in the maximization step of an EM algorithm. In addition, we present models and methods for longitudinal family data where subjects in a family are correlated and we decompose the covariance function into a subject-level source and observation-level source. We apply these methods to the multi-level Framingham Heart Study data to estimate age-specific heritability of systolic blood pressure (SBP) nonparametrically.

doi:10.1002/sim.4236

PMCID: PMC3115522
PMID: 21491474

Multi-level functional data; Cholesky decomposition; Age-specific heritability; Framingham Heart Study

Many phenomena of fundamental importance to biology and biomedicine arise as a dynamic curve, such as organ growth and HIV dynamics. The genetic mapping of these traits is challenged by longitudinal variables measured at irregular and possibly subject-specific time points, in which case nonnegative definiteness of the estimated covariance matrix needs to be guaranteed. We present a semiparametric approach for genetic mapping within the mixture-model setting by jointly modeling mean and covariance structures for irregular longitudinal data. Penalized spline is used to model the mean functions of individual QTL genotypes as latent variables while an extended generalized linear model is used to approximate the covariance matrix. The parameters for modeling the mean-covariances are estimated by MCMC, using Gibbs sampler and Metropolis Hastings algorithm. We derive the full conditional distributions for the mean and covariance parameters and compute Bayes factors to test the hypothesis about the existence of significant QTLs. The model was used to screen the existence of specific QTLs for age-specific change of body mass index with a sparse longitudinal dataset. The new model provides powerful means for broadening the application of genetic mapping to reveal the genetic control of dynamic traits.

doi:10.1002/sim.5535

PMCID: PMC3770845
PMID: 22903809

Cholesky decomposition; genetic mapping; MCMC; penalized spline; quantitative trait loci

Summary

Estimation of the covariance structure for irregular sparse longitudinal data has been studied by many authors in recent years but typically using fully parametric specifications. In addition, when data are collected from several groups over time, it is known that assuming the same or completely different covariance matrices over groups can lead to loss of efficiency and/or bias. Nonparametric approaches have been proposed for estimating the covariance matrix for regular univariate longitudinal data by sharing information across the groups under study. For the irregular case, with longitudinal measurements that are bivariate or multivariate, modeling becomes more difficult. In this article, to model bivariate sparse longitudinal data from several groups, we propose a flexible covariance structure via a novel matrix stick-breaking process for the residual covariance structure and a Dirichlet process mixture of normals for the random effects. Simulation studies are performed to investigate the effectiveness of the proposed approach over more traditional approaches. We also analyze a subset of Framingham Heart Study data to examine how the blood pressure trajectories and covariance structures differ for the patients from different BMI groups (high, medium and low) at baseline.

doi:10.1111/biom.12133

PMCID: PMC3954444
PMID: 24400941

Covariance matrix; DIC; Dirichlet process mixture of normals; MCMC

Summary

In analysis of longitudinal data, it is not uncommon that observation times of repeated measurements are subject-specific and correlated with underlying longitudinal outcomes. Taking account of the dependence between observation times and longitudinal outcomes is critical under these situations to assure the validity of statistical inference. In this article, we propose a flexible joint model for longitudinal data analysis in the presence of informative observation times. In particular, the new procedure considers the shared random-effect model and assume a time-varying coefficient for the latent variable, allowing a flexible way of modeling longitudinal outcomes while adjusting their association with observation times. Estimating equations are developed for parameter estimation. We show that the resulting estimators are consistent and asymptotically normal, with variance-covariance matrix that has a closed form and can be consistently estimated by the usual plug-in method. One additional advantage of the procedure is that, it provides a unified framework to test whether the effect of the latent variable is zero, constant, or time-varying. Simulation studies show that the proposed approach is appropriate for practical use. An application to a bladder cancer data is also given to illustrate the methodology.

doi:10.1111/j.1541-0420.2012.01794.x

PMCID: PMC3543780
PMID: 23025338

Estimating equation method; Informative observation times; Longitudinal data analysis; Time-varying effect

We focus on sparse modelling of high-dimensional covariance matrices using Bayesian latent factor models. We propose a multiplicative gamma process shrinkage prior on the factor loadings which allows introduction of infinitely many factors, with the loadings increasingly shrunk towards zero as the column index increases. We use our prior on a parameter-expanded loading matrix to avoid the order dependence typical in factor analysis models and develop an efficient Gibbs sampler that scales well as data dimensionality increases. The gain in efficiency is achieved by the joint conjugacy property of the proposed prior, which allows block updating of the loadings matrix. We propose an adaptive Gibbs sampler for automatically truncating the infinite loading matrix through selection of the number of important factors. Theoretical results are provided on the support of the prior and truncation approximation bounds. A fast algorithm is proposed to produce approximate Bayes estimates. Latent factor regression methods are developed for prediction and variable selection in applications with high-dimensional correlated predictors. Operating characteristics are assessed through simulation studies, and the approach is applied to predict survival times from gene expression data.

doi:10.1093/biomet/asr013

PMCID: PMC3419391
PMID: 23049129

Adaptive Gibbs sampling; Factor analysis; High-dimensional data; Multiplicative gamma process; Parameter expansion; Regularization; Shrinkage

Flexible multilevel models are proposed to allow for cluster-specific smooth estimation of growth curves in a mixed-effects modeling format that includes subject-specific random effects on the growth parameters. Attention is then focused on models that examine between-cluster comparisons of the effects of an ecologic covariate of interest (e.g. air pollution) on nonlinear functionals of growth curves (e.g. maximum rate of growth). A Gibbs sampling approach is used to get posterior mean estimates of nonlinear functionals along with their uncertainty estimates. A second-stage ecologic random-effects model is used to examine the association between a covariate of interest (e.g. air pollution) and the nonlinear functionals. A unified estimation procedure is presented along with its computational and theoretical details. The models are motivated by, and illustrated with, lung function and air pollution data from the Southern California Children's Health Study.

doi:10.1093/biostatistics/kxm059

PMCID: PMC2733176
PMID: 18349036

Air pollution; Correlated data; Growth curves; Mixed-effects; Splines

Linear mixed effects (LME) models are useful for longitudinal data/repeated measurements. We propose a new class of covariate-adjusted LME models for longitudinal data that nonparametrically adjusts for a normalizing covariate. The proposed approach involves fitting a parametric LME model to the data after adjusting for the nonparametric effects of a baseline confounding covariate. In particular, the effect of the observable covariate on the response and predictors of the LME model is modeled nonparametrically via smooth unknown functions. In addition to covariate-adjusted estimation of fixed/population parameters and random effects, an estimation procedure for the variance components is also developed. Numerical properties of the proposed estimators are investigated with simulation studies. The consistency and convergence rates of the proposed estimators are also established. An application to a longitudinal data set on calcium absorption, accounting for baseline distortion from body mass index, illustrates the proposed methodology.

doi:10.1080/10485250802226435

PMCID: PMC2650843
PMID: 19266053

Binning; Covariance structure; Covariate-adjusted regression (CAR); Longitudinal data; Mixed model; Multiplicative effect; Varying coefficient models

Missing covariate data is common in observational studies of time to an event, especially when covariates are repeatedly measured over time. Failure to account for the missing data can lead to bias or loss of efficiency, especially when the data are non-ignorably missing. Previous work has focused on the case of fixed covariates rather than those that are repeatedly measured over the follow-up period, so here we present a selection model that allows for proportional hazards regression with time-varying covariates when some covariates may be non-ignorably missing. We develop a fully Bayesian model and obtain posterior estimates of the parameters via the Gibbs sampler in WinBUGS. We illustrate our model with an analysis of post-diagnosis weight change and survival after breast cancer diagnosis in the Long Island Breast Cancer Study Project (LIBCSP) follow-up study. Our results indicate that post-diagnosis weight gain is associated with lower all-cause and breast cancer specific survival among women diagnosed with new primary breast cancer. Our sensitivity analysis showed only slight differences between models with different assumptions on the missing data mechanism yet the complete case analysis yielded markedly different results.

doi:10.1002/sim.4076

PMCID: PMC3253577
PMID: 20960582

proportional hazards regression; non-ignorably missing data; missing covariates; selection model

Simulated data were used to investigate the influence of the choice of priors on estimation of genetic parameters in multivariate threshold models using Gibbs sampling. We simulated additive values, residuals and fixed effects for one continuous trait and liabilities of four binary traits, and QTL effects for one of the liabilities. Within each of four replicates six different datasets were generated which resembled different practical scenarios in horses with respect to number and distribution of animals with trait records and availability of QTL information. (Co)Variance components were estimated using a Bayesian threshold animal model via Gibbs sampling. The Gibbs sampler was implemented with both a flat and a proper prior for the genetic covariance matrix. Convergence problems were encountered in > 50% of flat prior analyses, with indications of potential or near posterior impropriety between about round 10 000 and 100 000. Terminations due to non-positive definite genetic covariance matrix occurred in flat prior analyses of the smallest datasets. Use of a proper prior resulted in improved mixing and convergence of the Gibbs chain. In order to avoid (near) impropriety of posteriors and extremely poorly mixing Gibbs chains, a proper prior should be used for the genetic covariance matrix when implementing the Gibbs sampler.

doi:10.1186/1297-9686-39-2-123

PMCID: PMC2682833
PMID: 17306197

Gibbs sampling; multivariate threshold model; covariance estimates; flat prior; proper prior

Background

It has been argued that multibreed animal models should include a heterogeneous covariance structure. However, the estimation of the (co)variance components is not an easy task, because these parameters can not be factored out from the inverse of the additive genetic covariance matrix. An alternative model, based on the decomposition of the genetic covariance matrix by source of variability, provides a much simpler formulation. In this study, we formalize the equivalence between this alternative model and the one derived from the quantitative genetic theory. Further, we extend the model to include maternal effects and, in order to estimate the (co)variance components, we describe a hierarchical Bayes implementation. Finally, we implement the model to weaning weight data from an Angus × Hereford crossbred experiment.

Methods

Our argument is based on redefining the vectors of breeding values by breed origin such that they do not include individuals with null contributions. Next, we define matrices that retrieve the null-row and the null-column pattern and, by means of appropriate algebraic operations, we demonstrate the equivalence. The extension to include maternal effects and the estimation of the (co)variance components through the hierarchical Bayes analysis are then straightforward. A FORTRAN 90 Gibbs sampler was specifically programmed and executed to estimate the (co)variance components of the Angus × Hereford population.

Results

In general, genetic (co)variance components showed marginal posterior densities with a high degree of symmetry, except for the segregation components. Angus and Hereford breeds contributed with 50.26% and 41.73% of the total direct additive variance, and with 23.59% and 59.65% of the total maternal additive variance. In turn, the contribution of the segregation variance was not significant in either case, which suggests that the allelic frequencies in the two parental breeds were similar.

Conclusion

The multibreed maternal animal model introduced in this study simplifies the problem of estimating (co)variance components in the framework of a hierarchical Bayes analysis. Using this approach, we obtained for the first time estimates of the full set of genetic (co)variance components. It would be interesting to assess the performance of the procedure with field data, especially when interbreed information is limited.

doi:10.1186/1297-9686-42-20

PMCID: PMC2909157
PMID: 20540758

Longitudinal data arise frequently in medical studies and it is a common practice to analyze such complex data with nonlinear mixed-effects (NLME) models which enable us to account for between-subject and within-subject variations. To partially explain the variations, covariates are usually introduced to these models. Some covariates, however, may be often measured with substantial errors. It is often the case that model random error is assumed to be distributed normally, but the normality assumption may not always give robust and reliable results, particularly if the data exhibit skewness. Although there has been considerable interest in accommodating either skewness or covariate measurement error in the literature, there is relatively little work that considers both features simultaneously. In this article, our objectives are to address simultaneous impact of skewness and covariate measurement error by jointly modeling the response and covariate processes under a general framework of Bayesian semiparametric nonlinear mixed-effects models. The method is illustrated in an AIDS data example to compare potential models which have different distributional specifications. The findings from this study suggest that the models with a skew-normal distribution may provide more reasonable results if the data exhibit skewness and/or have measurement errors in covariates.

doi:10.1214/12-BA706

PMCID: PMC3584628
PMID: 23459161

Bayesian approach; Covariate measurement errors; HIV/AIDS; Joint models; Longitudinal data; Semiparametric nonlinear mixed-effects models; Skew-normal distribution

A fully Bayesian analysis using Gibbs sampling and data augmentation in a multivariate model of Gaussian, right censored, and grouped Gaussian traits is described. The grouped Gaussian traits are either ordered categorical traits (with more than two categories) or binary traits, where the grouping is determined via thresholds on the underlying Gaussian scale, the liability scale. Allowances are made for unequal models, unknown covariance matrices and missing data. Having outlined the theory, strategies for implementation are reviewed. These include joint sampling of location parameters; efficient sampling from the fully conditional posterior distribution of augmented data, a multivariate truncated normal distribution; and sampling from the conditional inverse Wishart distribution, the fully conditional posterior distribution of the residual covariance matrix. Finally, a simulated dataset was analysed to illustrate the methodology. This paper concentrates on a model where residuals associated with liabilities of the binary traits are assumed to be independent. A Bayesian analysis using Gibbs sampling is outlined for the model where this assumption is relaxed.

doi:10.1186/1297-9686-35-2-159

PMCID: PMC2732693
PMID: 12633531

categorical; Gaussian; multivariate Bayesian analysis; right censored Gaussian

We consider the problem of jointly modeling survival time and longitudinal data subject to measurement error. The survival times are modeled through the proportional hazards model and a random effects model is assumed for the longitudinal covariate process. Under this framework, we propose an approximate nonparametric corrected-score estimator for the parameter, which describes the association between the time-to-event and the longitudinal covariate. The term nonparametric refers to the fact that assumptions regarding the distribution of the random effects and that of the measurement error are unnecessary. The finite sample size performance of the approximate nonparametric corrected-score estimator is examined through simulation studies and its asymptotic properties are also developed. Furthermore, the proposed estimator and some existing estimators are applied to real data from an AIDS clinical trial.

doi:10.1002/bimj.201000180

PMCID: PMC3724540
PMID: 21717494

Corrected score; Cumulant generating function; Measurement error; Proportional hazards; Random effects

SUMMARY

We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a nonzero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a postprocessing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods.

doi:10.1111/j.1541-0420.2009.01227.x

PMCID: PMC3081790
PMID: 19432777

Dirichlet process prior; Identifiability; Postprocessing; Random effects; Smoothing spline; Uniform shrinkage prior; Variance components

Joint models are frequently used in survival analysis to assess the relationship between time-to-event data and time-dependent covariates, which are measured longitudinally but often with errors. Routinely, a linear mixed-effects model is used to describe the longitudinal data process, while the survival times are assumed to follow the proportional hazards model. However, in some practical situations, individual covariate profiles may contain changepoints. In this article, we assume a two-phase polynomial random effects with subject-specific changepoint model for the longitudinal data process and the proportional hazards model for the survival times. Our main interest is in the estimation of the parameter in the hazards model. We incorporate a smooth transition function into the changepoint model for the longitudinal data and develop the corrected score and conditional score estimators, which do not require any assumption regarding the underlying distribution of the random effects or that of the changepoints. The estimators are shown to be asymptotically equivalent and their finite-sample performance is examined via simulations. The methods are applied to AIDS clinical trial data.

doi:10.1002/sim.4107

PMCID: PMC3059268
PMID: 21213341

Changepoint; Conditional score; Corrected score; Measurement error; Random effects; Proportional hazards

We consider a random effects quantile regression analysis of clustered data and propose a semiparametric approach using empirical likelihood. The random regression coefficients are assumed independent with a common mean, following parametrically specified distributions. The common mean corresponds to the population-average effects of explanatory variables on the conditional quantile of interest, while the random coefficients represent cluster specific deviations in the covariate effects. We formulate the estimation of the random coefficients as an estimating equations problem and use empirical likelihood to incorporate the parametric likelihood of the random coefficients. A likelihood-like statistical criterion function is yield, which we show is asymptotically concave in a neighborhood of the true parameter value and motivates its maximizer as a natural estimator. We use Markov Chain Monte Carlo (MCMC) samplers in the Bayesian framework, and propose the resulting quasi-posterior mean as an estimator. We show that the proposed estimator of the population-level parameter is asymptotically normal and the estimators of the random coefficients are shrunk toward the population-level parameter in the first order asymptotic sense. These asymptotic results do not require Gaussian random effects, and the empirical likelihood based likelihood-like criterion function is free of parameters related to the error densities. This makes the proposed approach both flexible and computationally simple. We illustrate the methodology with two real data examples.

doi:10.1198/jasa.2011.tm10470.

PMCID: PMC3280824
PMID: 22347760

Empirical likelihood; Markov Chain Monte Carlo; Quasi-posterior distribution

Estimation of the covariance structure of longitudinal processes is a fundamental prerequisite for the practical deployment of functional mapping designed to study the genetic regulation and network of quantitative variation in dynamic complex traits. We present a nonparametric approach for estimating the covariance structure of a quantitative trait measured repeatedly at a series of time points. Specifically, we adopt Huang et al.’s (2006a) approach of invoking the modified Cholesky decomposition and converting the problem into modeling a sequence of regressions of responses. A regularized covariance estimator is obtained using a normal penalized likelihood with an L2 penalty. This approach, embedded within a mixture likelihood framework, leads to enhanced accuracy, precision and flexibility of functional mapping while preserving its biological relevance. Simulation studies are performed to reveal the statistical properties and advantages of the proposed method. A real example from a mouse genome project is analyzed to illustrate the utilization of the methodology. The new method will provide a useful tool for genome-wide scanning for the existence and distribution of quantitative trait loci underlying a dynamic trait important to agriculture, biology and health sciences.

doi:10.1111/j.1541-0420.2009.01222.x

PMCID: PMC2987658
PMID: 19302406

Functional Mapping; Quantitative Trait Loci; Covariance Estimation; Longitudinal Data; Multivariate Normal Mixture

We consider the variable selection problem for a class of statistical models with missing data, including missing covariate and/or response data. We investigate the smoothly clipped absolute deviation penalty (SCAD) and adaptive LASSO and propose a unified model selection and estimation procedure for use in the presence of missing data. We develop a computationally attractive algorithm for simultaneously optimizing the penalized likelihood function and estimating the penalty parameters. Particularly, we propose to use a model selection criterion, called the ICQ statistic, for selecting the penalty parameters. We show that the variable selection procedure based on ICQ automatically and consistently selects the important covariates and leads to efficient estimates with oracle properties. The methodology is very general and can be applied to numerous situations involving missing data, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Simulations are given to demonstrate the methodology and examine the finite sample performance of the variable selection procedures. Melanoma data from a cancer clinical trial is presented to illustrate the proposed methodology.

PMCID: PMC2844735
PMID: 20336190

EM algorithm; ICQ; missing data; penalized likelihood; variable selection

Summary

In the modeling of longitudinal data from several groups, appropriate handling of the dependence structure is of central importance. Standard methods include specifying a single covariance matrix for all groups or independently estimating the covariance matrix for each group without regard to the others, but when these model assumptions are incorrect, these techniques can lead to biased mean effects or loss of efficiency, respectively. Thus, it is desirable to develop methods to simultaneously estimate the covariance matrix for each group that will borrow strength across groups in a way that is ultimately informed by the data. In addition, for several groups with covariance matrices of even medium dimension, it is difficult to manually select a single best parametric model among the huge number of possibilities given by incorporating structural zeros and/or commonality of individual parameters across groups. In this paper we develop a family of nonparametric priors using the matrix stick-breaking process of Dunson et al. (2008) that seeks to accomplish this task by parameterizing the covariance matrices in terms of the parameters of their modified Cholesky decomposition (Pourahmadi, 1999). We establish some theoretic properties of these priors, examine their effectiveness via a simulation study, and illustrate the priors using data from a longitudinal clinical trial.

doi:10.1093/biomet/ass060

PMCID: PMC3852937
PMID: 24324281

Bayesian nonparametric inference; Cholesky decomposition; matrix stick-breaking process; simultaneous covariance estimation; sparsity