It is a common practice to analyze complex longitudinal data using semiparametric nonlinear mixed-effects (SNLME) models with a normal distribution. Normality assumption of model errors may unrealistically obscure important features of subject variations. To partially explain between- and within-subject variations, covariates are usually introduced in such models, but some covariates may often be measured with substantial errors. Moreover, the responses may be missing and the missingness may be nonignorable. Inferential procedures can be complicated dramatically when data with skewness, missing values, and measurement error are observed. In the literature, there has been considerable interest in accommodating either skewness, incompleteness or covariate measurement error in such models, but there has been relatively little study concerning all three features simultaneously. In this article, our objective is to address the simultaneous impact of skewness, missingness, and covariate measurement error by jointly modeling the response and covariate processes based on a flexible Bayesian SNLME model. The method is illustrated using a real AIDS data set to compare potential models with various scenarios and different distribution specifications.
Bayesian analysis; Covariate measurement errors; Longitudinal data; Missing data; Random-effects models; Skew distributions
Longitudinal data arise frequently in medical studies and it is a common practice to analyze such complex data with nonlinear mixed-effects (NLME) models which enable us to account for between-subject and within-subject variations. To partially explain the variations, covariates are usually introduced to these models. Some covariates, however, may be often measured with substantial errors. It is often the case that model random error is assumed to be distributed normally, but the normality assumption may not always give robust and reliable results, particularly if the data exhibit skewness. Although there has been considerable interest in accommodating either skewness or covariate measurement error in the literature, there is relatively little work that considers both features simultaneously. In this article, our objectives are to address simultaneous impact of skewness and covariate measurement error by jointly modeling the response and covariate processes under a general framework of Bayesian semiparametric nonlinear mixed-effects models. The method is illustrated in an AIDS data example to compare potential models which have different distributional specifications. The findings from this study suggest that the models with a skew-normal distribution may provide more reasonable results if the data exhibit skewness and/or have measurement errors in covariates.
Bayesian approach; Covariate measurement errors; HIV/AIDS; Joint models; Longitudinal data; Semiparametric nonlinear mixed-effects models; Skew-normal distribution
Ordinary least squares linear regression (OLSLR) analyses are inappropriate for performing trend analysis on repeatedly measured longitudinal data. This study examines multilevel linear mixed-effects (LME) and nonlinear mixed-effects (NLME) methods to model longitudinally collected perimetry data and determines whether NLME methods provide significant improvements over LME methods and OLSLR.
Models of LME and NLME (exponential, whereby the rate of change in sensitivity worsens over time) were examined with two levels of nesting (subject and eye within subject) to predict the mean deviation. Models were compared using analysis of variance or Akaike's information criterion and Bayesian information criterion, as appropriate.
Nonlinear (exponential) models provided significantly better fits than linear models (P < 0.0001). Nonlinear fits markedly improved the validity of the model, as evidenced by the lack of significant autocorrelation, residuals that are closer to being normally distributed, and improved homogeneity. From the fitted exponential model, the rate of glaucomatous progression for an average subject of age 70 years was −0.07 decibels (dB) per year. Ten years later, the same eye would be deteriorating at −0.12 dB/y.
Multilevel mixed-effects models provide better fits to the test data than OLSLR by accounting for group effects and/or within-group correlation. However, the fitted LME model poorly tracks visual field (VF) change over time. An exponential model provides a significant improvement over linear models and more accurately tracks VF change over time in this cohort.
OLS methods are inappropriate for performing trend analyses on repeatedly measured longitudinal data. Instead, a nonlinear (exponential) model provides a significant improvement over linear models and more accurately tracks visual field change over time in this cohort.
glaucoma; mean deviation; linear mixed effect; nonlinear mixed effect; autocorrelation
For the analysis of length-of-stay (LOS) data, which is characteristically right-skewed, a number of statistical estimators have been proposed as alternatives to the traditional ordinary least squares (OLS) regression with log dependent variable.
Using a cohort of patients identified in the Australian and New Zealand Intensive Care Society Adult Patient Database, 2008–2009, 12 different methods were used for estimation of intensive care (ICU) length of stay. These encompassed risk-adjusted regression analysis of firstly: log LOS using OLS, linear mixed model [LMM], treatment effects, skew-normal and skew-t models; and secondly: unmodified (raw) LOS via OLS, generalised linear models [GLMs] with log-link and 4 different distributions [Poisson, gamma, negative binomial and inverse-Gaussian], extended estimating equations [EEE] and a finite mixture model including a gamma distribution. A fixed covariate list and ICU-site clustering with robust variance were utilised for model fitting with split-sample determination (80%) and validation (20%) data sets, and model simulation was undertaken to establish over-fitting (Copas test). Indices of model specification using Bayesian information criterion [BIC: lower values preferred] and residual analysis as well as predictive performance (R2, concordance correlation coefficient (CCC), mean absolute error [MAE]) were established for each estimator.
The data-set consisted of 111663 patients from 131 ICUs; with mean(SD) age 60.6(18.8) years, 43.0% were female, 40.7% were mechanically ventilated and ICU mortality was 7.8%. ICU length-of-stay was 3.4(5.1) (median 1.8, range (0.17-60)) days and demonstrated marked kurtosis and right skew (29.4 and 4.4 respectively). BIC showed considerable spread, from a maximum of 509801 (OLS-raw scale) to a minimum of 210286 (LMM). R2 ranged from 0.22 (LMM) to 0.17 and the CCC from 0.334 (LMM) to 0.149, with MAE 2.2-2.4. Superior residual behaviour was established for the log-scale estimators. There was a general tendency for over-prediction (negative residuals) and for over-fitting, the exception being the GLM negative binomial estimator. The mean-variance function was best approximated by a quadratic function, consistent with log-scale estimation; the link function was estimated (EEE) as 0.152(0.019, 0.285), consistent with a fractional-root function.
For ICU length of stay, log-scale estimation, in particular the LMM, appeared to be the most consistently performing estimator(s). Neither the GLM variants nor the skew-regression estimators dominated.
There has been great public health interest in estimating usual, i.e., long-term average, intake of episodically consumed dietary components that are not consumed daily by everyone, e.g., fish, red meat and whole grains. Short-term measurements of episodically consumed dietary components have zero-inflated skewed distributions. So-called two-part models have been developed for such data in order to correct for measurement error due to within-person variation and to estimate the distribution of usual intake of the dietary component in the univariate case. However, there is arguably much greater public health interest in the usual intake of an episodically consumed dietary component adjusted for energy (caloric) intake, e.g., ounces of whole grains per 1000 kilo-calories, which reflects usual dietary composition and adjusts for different total amounts of caloric intake. Because of this public health interest, it is important to have models to fit such data, and it is important that the model-fitting methods can be applied to all episodically consumed dietary components.
We have recently developed a nonlinear mixed effects model (Kipnis, et al., 2010), and have fit it by maximum likelihood using nonlinear mixed effects programs and methodology (the SAS NLMIXED procedure). Maximum likelihood fitting of such a nonlinear mixed model is generally slow because of 3-dimensional adaptive Gaussian quadrature, and there are times when the programs either fail to converge or converge to models with a singular covariance matrix. For these reasons, we develop a Monte-Carlo (MCMC) computation of fitting this model, which allows for both frequentist and Bayesian inference. There are technical challenges to developing this solution because one of the covariance matrices in the model is patterned. Our main application is to the National Institutes of Health (NIH)-AARP Diet and Health Study, where we illustrate our methods for modeling the energy-adjusted usual intake of fish and whole grains. We demonstrate numerically that our methods lead to increased speed of computation, converge to reasonable solutions, and have the flexibility to be used in either a frequentist or a Bayesian manner.
Bayesian approach; latent variables; measurement error; mixed effects models; nutritional epidemiology; zero-inflated data
Often in biomedical studies, the routine use of linear mixed-effects models (based on Gaussian assumptions) can be questionable when the longitudinal responses are skewed in nature. Skew-normal/elliptical models are widely used in those situations. Often, those skewed responses might also be subjected to some upper and lower quantification limits (viz. longitudinal viral load measures in HIV studies), beyond which they are not measurable. In this paper, we develop a Bayesian analysis of censored linear mixed models replacing the Gaussian assumptions with skew-normal/independent (SNI) distributions. The SNI is an attractive class of asymmetric heavy-tailed distributions that includes the skew-normal, the skew-t, skew-slash and the skew-contaminated normal distributions as special cases. The proposed model provides flexibility in capturing the effects of skewness and heavy tail for responses which are either left- or right-censored. For our analysis, we adopt a Bayesian framework and develop a MCMC algorithm to carry out the posterior analyses. The marginal likelihood is tractable, and utilized to compute not only some Bayesian model selection measures but also case-deletion influence diagnostics based on the Kullback-Leibler divergence. The newly developed procedures are illustrated with a simulation study as well as a HIV case study involving analysis of longitudinal viral loads.
Bayesian inference; Detection limit; HIV viral load; Linear mixed models; Skew-normal/independent distribution
Bivariate clustered (correlated) data often encountered in epidemiological and clinical research are routinely analyzed under a linear mixed model framework with underlying normality assumptions of the random effects and within-subject errors. However, such normality assumptions might be questionable if the data-set particularly exhibit skewness and heavy tails. Using a Bayesian paradigm, we use the skew-normal/independent (SNI) distribution as a tool for modeling clustered data with bivariate non-normal responses in a linear mixed model framework. The SNI distribution is an attractive class of asymmetric thick-tailed parametric structure which includes the skew-normal distribution as a special case. We assume that the random effects follows multivariate skew-normal/independent distributions and the random errors follow symmetric normal/independent distributions which provides substantial robustness over the symmetric normal process in a linear mixed model framework. Specific distributions obtained as special cases, viz. the skew-t, the skew-slash and the skew-contaminated normal distributions are compared, along with the default skew-normal density. The methodology is illustrated through an application to a real data which records the periodontal health status of an interesting population using periodontal pocket depth (PPD) and clinical attachment level (CAL).
Bayesian; linear mixed model; MCMC; normal/independent distributions; skewness
Linear mixed effects (LME) models are useful for longitudinal data/repeated measurements. We propose a new class of covariate-adjusted LME models for longitudinal data that nonparametrically adjusts for a normalizing covariate. The proposed approach involves fitting a parametric LME model to the data after adjusting for the nonparametric effects of a baseline confounding covariate. In particular, the effect of the observable covariate on the response and predictors of the LME model is modeled nonparametrically via smooth unknown functions. In addition to covariate-adjusted estimation of fixed/population parameters and random effects, an estimation procedure for the variance components is also developed. Numerical properties of the proposed estimators are investigated with simulation studies. The consistency and convergence rates of the proposed estimators are also established. An application to a longitudinal data set on calcium absorption, accounting for baseline distortion from body mass index, illustrates the proposed methodology.
Binning; Covariance structure; Covariate-adjusted regression (CAR); Longitudinal data; Mixed model; Multiplicative effect; Varying coefficient models
We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least-squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model also makes it possible to test for the overall genetic pathway effect. We show that the LSKM semiparametric regression can be formulated using a linear mixed model. Estimation and inference hence can proceed within the linear mixed model framework using standard mixed model software. Both the regression coefficients of the covariate effects and the LSKM estimator of the genetic pathway effect can be obtained using the best linear unbiased predictor in the corresponding linear mixed model formulation. The smoothing parameter and the kernel parameter can be estimated as variance components using restricted maximum likelihood. A score test is developed to test for the genetic pathway effect. Model/variable selection within the LSKM framework is discussed. The methods are illustrated using a prostate cancer data set and evaluated using simulations.
BLUPs; Kernel function; Model/variable selection; Nonparametric regression; Penalized likelihood; REML; Score test; Smoothing parameter; Support vector machines
In some clinical trials and epidemiologic studies, investigators are interested in knowing whether the variability of a biomarker is independently predictive of clinical outcomes. This question is often addressed via a naïve approach where a sample-based estimate (e.g., standard deviation) is calculated as a surrogate for the “true” variability and then used in regression models as a covariate assumed to be free of measurement error. However, it is well known that the measurement error in covariates causes underestimation of the true association. The issue of underestimation can be substantial when the precision is low because of limited number of measures per subject. The joint analysis of survival data and longitudinal data enables one to account for the measurement error in longitudinal data and has received substantial attention in recent years. In this paper we propose a joint model to assess the predictive effect of biomarker variability. The joint model consists of two linked sub-models, a linear mixed model with patient-specific variance for longitudinal data and a full parametric Weibull distribution for survival data, and the association between two models is induced by a latent Gaussian process. Parameters in the joint model are estimated under Bayesian framework and implemented using Markov chain Monte Carlo (MCMC) methods with WinBUGS software. The method is illustrated in the Ocular Hypertension Treatment Study to assess whether the variability of intraocular pressure is an independent risk of primary open-angle glaucoma. The performance of the method is also assessed by simulation studies.
Patient-specific variance; Survival data; Longitudinal data; Joint model; Markov chain Monte Carlo (MCMC); WinBUGS
Linear mixed effects (LME) models are increasingly used for analyses of biological and biomedical data. When the multivariate normal assumption is not adequate for an LME model, then a robust estimation approach is preferable to the maximum likelihood one. M-estimators were considered before for robust estimation of the LME models, and recently a constrained S-estimator was proposed. This S-estimator can not be applied directly to LME models with correlated error terms and vector random effects with correlated dimensions. Therefore, a modification is proposed, which extends application of the constrained S-estimator to the LME models for multivariate responses with correlated dimensions and to longitudinal data. Also a new computational algorithm is developed for computing constrained S-estimators. Performance of the S-estimators based on the original Tukey’s biweight and translated biweight is evaluated in a small simulation study with repeated multivariate responses with correlated dimensions. Proposed methodology is applied to jointly analyze repeated measures on three cholesterol components, HDL, LDL, and triglycerides.
Multivariate linear mixed effects models; robust estimation; CTBS estimator for LME model; M-estimator
Conventional group analysis is usually performed with Student-type t-test, regression, or standard AN(C)OVA in which the variance–covariance matrix is presumed to have a simple structure. Some correction approaches are adopted when assumptions about the covariance structure is violated. However, as experiments are designed with different degrees of sophistication, these traditional methods can become cumbersome, or even be unable to handle the situation at hand. For example, most current FMRI software packages have difficulty analyzing the following scenarios at group level: (1) taking within-subject variability into account when there are effect estimates from multiple runs or sessions; (2) continuous explanatory variables (covariates) modeling in the presence of a within-subject (repeated measures) factor, multiple subject-grouping (between-subjects) factors, or the mixture of both; (3) subject-specific adjustments in covariate modeling; (4) group analysis with estimation of hemodynamic response (HDR) function by multiple basis functions; (5) various cases of missing data in longitudinal studies; and (6) group studies involving family members or twins.
Here we present a linear mixed-effects modeling (LME) methodology that extends the conventional group analysis approach to analyze many complicated cases, including the six prototypes delineated above, whose analyses would be otherwise either difficult or unfeasible under traditional frameworks such as AN(C)OVA and general linear model (GLM). In addition, the strength of the LME framework lies in its flexibility to model and estimate the variance–covariance structures for both random effects and residuals. The intraclass correlation (ICC) values can be easily obtained with an LME model with crossed random effects, even at the presence of confounding fixed effects. The simulations of one prototypical scenario indicate that the LME modeling keeps a balance between the control for false positives and the sensitivity for activation detection. The importance of hypothesis formulation is also illustrated in the simulations. Comparisons with alternative group analysis approaches and the limitations of LME are discussed in details.
FMRI group analysis; GLM; AN(C)OVA; LME; ICC; AFNI; R
Assays to measure concentration of antibody after vaccination are often subject to left-censoring due to a lower detection limit (LDL), leading to a high proportion of observations below the detection limit. Not accounting for such left-censoring appropriately can lead to biased parameter estimates. To properly adjust for left-censoring and a high proportion of observations at LDL, this paper proposes a mixture model combining a point mass below LDL and a Tobit model with skew-elliptical error distribution. We show that skew-elliptical distributions, where the skew-normal and skew-t are special cases, have great flexibility for simultaneously handling left-censoring, skewness and heaviness in the tails of a distribution of a response variable with left-censored data. A Bayesian procedure is used to estimate model parameters. Two real datasets from a study of measles vaccine and an HIV/AIDS study are used to illustrate the proposed models.
Bayesian inference; Censoring; Mixed-effects models; Skew-normal distribution; Tobit model
We propose a semiparametric Bayesian method for handling measurement error in nutritional epidemiological data. Our goal is to estimate nonparametrically the form of association between a disease and exposure variable while the true values of the exposure are never observed. Motivated by nutritional epidemiological data we consider the setting where a surrogate covariate is recorded in the primary data, and a calibration data set contains information on the surrogate variable and repeated measurements of an unbiased instrumental variable of the true exposure. We develop a flexible Bayesian method where not only is the relationship between the disease and exposure variable treated semiparametrically, but also the relationship between the surrogate and the true exposure is modeled semiparametrically. The two nonparametric functions are modeled simultaneously via B-splines. In addition, we model the distribution of the exposure variable as a Dirichlet process mixture of normal distributions, thus making its modeling essentially nonparametric and placing this work into the context of functional measurement error modeling. We apply our method to the NIH-AARP Diet and Health Study and examine its performance in a simulation study.
B-splines; Dirichlet process prior; Gibbs sampling; Measurement error; Metropolis-Hastings algorithm; Partly linear model
This article proposes a joint model for longitudinal measurements and competing risks survival data. The model consists of a linear mixed effects sub-model with t-distributed measurement errors for the longitudinal outcome, a proportional cause-specific hazards frailty sub-model for the survival outcome, and a regression sub-model for the variance-covariance matrix of the multivariate latent random effects based on a modified Cholesky decomposition. A Bayesian MCMC procedure is developed for parameter estimation and inference. Our method is insensitive to outlying longitudinal measurements in the presence of non-ignorable missing data due to dropout. Moreover, by modeling the variance-covariance matrix of the latent random effects, our model provides a useful framework for handling high-dimensional heterogeneous random effects and testing the homogeneous random effects assumption which is otherwise untestable in commonly used joint models. Finally, our model enables analysis of a survival outcome with intermittently measured time-dependent covariates and possibly correlated competing risks and dependent censoring, as well as joint analysis of the longitudinal and survival outcomes. Illustrations are given using a real data set from a lung study and simulation.
Joint model; Competing risks; Bayesian analysis; Cholesky decomposition; Mixed effects model; MCMC; Modeling random effects covariance matrix; Outlier
This paper considers identification and estimation of a general nonlinear Errors-in-Variables (EIV) model using two samples. Both samples consist of a dependent variable, some error-free covariates, and an error-prone covariate, for which the measurement error has unknown distribution and could be arbitrarily correlated with the latent true values; and neither sample contains an accurate measurement of the corresponding true variable. We assume that the regression model of interest — the conditional distribution of the dependent variable given the latent true covariate and the error-free covariates — is the same in both samples, but the distributions of the latent true covariates vary with observed error-free discrete covariates. We first show that the general latent nonlinear model is nonparametrically identified using the two samples when both could have nonclassical errors, without either instrumental variables or independence between the two samples. When the two samples are independent and the nonlinear regression model is parameterized, we propose sieve Quasi Maximum Likelihood Estimation (Q-MLE) for the parameter of interest, and establish its root-n consistency and asymptotic normality under possible misspecification, and its semiparametric efficiency under correct specification, with easily estimated standard errors. A Monte Carlo simulation and a data application are presented to show the power of the approach.
Data combination; Measurement error; Misspecified parametric latent model; Nonclassical measurement error; Nonlinear errors-in-variables model; Nonparametric identification; Sieve quasi likelihood
The purpose of this study was to describe the nonlinear pharmacokinetics of piperacillin observed during intermittent infusion and continuous infusion by using a nonparametric population modeling approach. Data were 120 serum piperacillin concentration measurements from eight adult cystic fibrosis (CF) patients. Individual pharmacokinetic parameter estimates during intermittent infusion or continuous infusion were calculated by noncompartmental analysis and with a maximum iterative two-stage Bayesian estimator. To simultaneously describe concentration-time data during intermittent infusion and continuous infusion, nonlinear models were parameterized as two-compartment Michaelis-Menten models. Models were fit to the data with the nonparametric expectation maximization algorithm. The calculations were executed on a remote supercomputer. Nonlinear models were evaluated by log-likelihood estimates, residual plots, and R2 values, and predictive performance was based on bias (mean weighted error [MWE]) and precision (mean weighted square error [MWSE]). A linear pharmacokinetic model could not describe combined intermittent infusion and continuous infusion data well. A good population model fit to the intermittent infusion and continuous infusion data was obtained with the constructed nonlinear models. Maximum a posteriori probability (MAP) Bayesian R2 values for the nonlinear models were 0.96 to 0.97. Median parameter estimates for the best nonlinear model were as follows: Km, 58 ± 75 mg/liter (mean and standard deviation); Vmax, 1,904 ± 1,009 mg/h; volume of distribution of the central compartment, 14.1 ± 3.0 liters; k12, 0.63 ± 0.41 h−1; and k21, 0.37 ± 0.19 h−1. The median bias (MWE) and precision (MWSE) values for MAP Bayesian estimation with the Michaelis-Menten model were 0.05 and 4.6 mg/liters, respectively. The developed nonlinear pharmacokinetic models can be used to optimize piperacillin therapy administered via continuous infusion in patients with CF and have distinct advantages over conventional linear models.
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.
Simulium damnosum s.l.; cluster covariates; QuickBird; onchoceriasis; annual biting rates; Bayesian; Togo
Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models.
We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC.
Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted.
The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient.
On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain.
Malaria is a major public health problem in Malawi, however, quantifying its burden in a population is a challenge. Routine hospital data provide a proxy for measuring the incidence of severe malaria and for crudely estimating morbidity rates. Using such data, this paper proposes a method to describe trends, patterns and factors associated with in-hospital mortality attributed to the disease.
We develop semiparametric regression models which allow joint analysis of nonlinear effects of calendar time and continuous covariates, spatially structured variation, unstructured heterogeneity, and other fixed covariates. Modelling and inference use the fully Bayesian approach via Markov Chain Monte Carlo (MCMC) simulation techniques. The methodology is applied to analyse data arising from paediatric wards in Zomba district, Malawi, between 2002 and 2003.
Results and Conclusion
We observe that the risk of dying in hospital is lower in the dry season, and for children who travel a distance of less than 5 kms to the hospital, but increases for those who are referred to the hospital. The results also indicate significant differences in both structured and unstructured spatial effects, and the health facility effects reveal considerable differences by type of facility or practice. More importantly, our approach shows non-linearities in the effect of metrical covariates on the probability of dying in hospital. The study emphasizes that the methodological framework used provides a useful tool for analysing the data at hand and of similar structure.
Preterm birth, defined as delivery before 37 completed weeks’ gestation, is a leading cause of infant morbidity and mortality. Identifying factors related to preterm delivery is an important goal of public health professionals who wish to identify etiologic pathways to target for prevention. Validation studies are often conducted in nutritional epidemiology in order to study measurement error in instruments that are generally less invasive or less expensive than ”gold standard” instruments. Data from such studies are then used in adjusting estimates based on the full study sample. However, measurement error in nutritional epidemiology has recently been shown to be complicated by correlated error structures in the study-wide and validation instruments. Investigators of a study of preterm birth and dietary intake designed a validation study to assess measurement error in a food frequency questionnaire (FFQ) administered during pregnancy and with the secondary goal of assessing whether a single administration of the FFQ could be used to describe intake over the relatively short pregnancy period, in which energy intake typically increases. Here, we describe a likelihood-based method via Markov Chain Monte Carlo to estimate the regression coefficients in a generalized linear model relating preterm birth to covariates, where one of the covariates is measured with error and the multivariate measurement error model has correlated errors among contemporaneous instruments (i.e. FFQs, 24-hour recalls, and/or biomarkers). Because of constraints on the covariance parameters in our likelihood, identifiability for all the variance and covariance parameters is not guaranteed and, therefore, we derive the necessary and suficient conditions to identify the variance and covariance parameters under our measurement error model and assumptions. We investigate the sensitivity of our likelihood-based model to distributional assumptions placed on the true folate intake by employing semi-parametric Bayesian methods through the mixture of Dirichlet process priors framework. We exemplify our methods in a recent prospective cohort study of risk factors for preterm birth. We use long-term folate as our error-prone predictor of interest, the food-frequency questionnaire (FFQ) and 24-hour recall as two biased instruments, and serum folate biomarker as the unbiased instrument. We found that folate intake, as measured by the FFQ, led to a conservative estimate of the estimated odds ratio of preterm birth (0.76) when compared to the odds ratio estimate from our likelihood-based approach, which adjusts for the measurement error (0.63). We found that our parametric model led to similar conclusions to the semi-parametric Bayesian model.
Adaptive-Rejection Sampling; Dirichlet process prior; MCMC; Semiparametric Bayes
Health-related quality of life (HRQL) has become an increasingly important outcome parameter in clinical trials and epidemiological research. HRQL scores are typically bounded at both ends of the scale and often highly skewed. Several regression techniques have been proposed to model such data in cross-sectional studies, however, methods applicable in longitudinal research are less well researched. This study examined the use of beta regression models for analyzing longitudinal HRQL data using two empirical examples with distributional features typically encountered in practice.
We used SF-6D utility data from a German older age cohort study and stroke-specific HRQL data from a randomized controlled trial. We described the conceptual differences between mixed and marginal beta regression models and compared both models to the commonly used linear mixed model in terms of overall fit and predictive accuracy.
At any measurement time, the beta distribution fitted the SF-6D utility data and stroke-specific HRQL data better than the normal distribution. The mixed beta model showed better likelihood-based fit statistics than the linear mixed model and respected the boundedness of the outcome variable. However, it tended to underestimate the true mean at the upper part of the distribution. Adjusted group means from marginal beta model and linear mixed model were nearly identical but differences could be observed with respect to standard errors.
Understanding the conceptual differences between mixed and marginal beta regression models is important for their proper use in the analysis of longitudinal HRQL data. Beta regression fits the typical distribution of HRQL data better than linear mixed models, however, if focus is on estimating group mean scores rather than making individual predictions, the two methods might not differ substantially.
Health-related quality of life; Beta regression; Longitudinal study; Mixed model; Marginal model
The distribution of residual effects in linear mixed models in animal breeding applications is typically assumed normal, which makes inferences vulnerable to outlier observations. In order to mute the impact of outliers, one option is to fit models with residuals having a heavy-tailed distribution. Here, a Student's-t model was considered for the distribution of the residuals with the degrees of freedom treated as unknown. Bayesian inference was used to investigate a bivariate Student's-t (BSt) model using Markov chain Monte Carlo methods in a simulation study and analysing field data for gestation length and birth weight permitted to study the practical implications of fitting heavy-tailed distributions for residuals in linear mixed models.
In the simulation study, bivariate residuals were generated using Student's-t distribution with 4 or 12 degrees of freedom, or a normal distribution. Sire models with bivariate Student's-t or normal residuals were fitted to each simulated dataset using a hierarchical Bayesian approach. For the field data, consisting of gestation length and birth weight records on 7,883 Italian Piemontese cattle, a sire-maternal grandsire model including fixed effects of sex-age of dam and uncorrelated random herd-year-season effects were fitted using a hierarchical Bayesian approach. Residuals were defined to follow bivariate normal or Student's-t distributions with unknown degrees of freedom.
Posterior mean estimates of degrees of freedom parameters seemed to be accurate and unbiased in the simulation study. Estimates of sire and herd variances were similar, if not identical, across fitted models. In the field data, there was strong support based on predictive log-likelihood values for the Student's-t error model. Most of the posterior density for degrees of freedom was below 4. Posterior means of direct and maternal heritabilities for birth weight were smaller in the Student's-t model than those in the normal model. Re-rankings of sires were observed between heavy-tailed and normal models.
Reliable estimates of degrees of freedom were obtained in all simulated heavy-tailed and normal datasets. The predictive log-likelihood was able to distinguish the correct model among the models fitted to heavy-tailed datasets. There was no disadvantage of fitting a heavy-tailed model when the true model was normal. Predictive log-likelihood values indicated that heavy-tailed models with low degrees of freedom values fitted gestation length and birth weight data better than a model with normally distributed residuals.
Heavy-tailed and normal models resulted in different estimates of direct and maternal heritabilities, and different sire rankings. Heavy-tailed models may be more appropriate for reliable estimation of genetic parameters from field data.
We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a nonzero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a postprocessing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods.
Dirichlet process prior; Identifiability; Postprocessing; Random effects; Smoothing spline; Uniform shrinkage prior; Variance components
This study investigated the influence of P-glycoprotein (P-gp) inhibitor tariquidar on the pharmacokinetics of P-gp substrate radiotracer (R)-[11C]verapamil in plasma and brain of rats and humans by means of positron emission tomography (PET).
Data obtained from a preclinical and clinical study, in which paired (R)-[11C]verapamil PET scans were performed before, during, and after tariquidar administration, were analyzed using nonlinear mixed effects (NLME) modeling. Administration of tariquidar was included as a covariate on the influx and efflux parameters (Qin and Qout) in order to investigate if tariquidar increased influx or decreased outflux of radiotracer across the blood–brain barrier (BBB). Additionally, the influence of pilocarpine-induced status epilepticus (SE) was tested on all model parameters, and the brain-to-plasma partition coefficient (VT-NLME) was calculated.
Our model indicated that tariquidar enhances brain uptake of (R)-[11C]verapamil by decreasing Qout. The reduction in Qout in rats during and immediately after tariquidar administration (sevenfold) was more pronounced than in the second PET scan acquired 2 h after tariquidar administration (fivefold). The effect of tariquidar on Qout in humans was apparent during and immediately after tariquidar administration (twofold reduction in Qout) but was negligible in the second PET scan. SE was found to influence the pharmacological volume of distribution of the central brain compartment Vbr1. Tariquidar treatment lead to an increase in VT-NLME, and pilocarpine-induced SE lead to increased (R)-[11C]verapamil distribution to the peripheral brain compartment.
Using NLME modeling, we were able to provide mechanistic insight into the effects of tariquidar and SE on (R)-[11C]verapamil transport across the BBB in control and 48 h post SE rats as well as in humans.
Nonlinear mixed effects modeling; Positron emission tomography; (R)-[11C]verapamil; P-glycoprotein; Tariquidar; Pilocarpine-induced epilepsy; Species differences