Random effects models are commonly used to analyze longitudinal categorical data. Marginalized random effects models are a class of models that permit direct estimation of marginal mean parameters and characterize serial correlation for longitudinal categorical data via random effects (Heagerty, 1999). Marginally specified logistic-normal models for longitudinal binary data. Biometrics
55, 688–698; Lee and Daniels, 2008. Marginalized models for longitudinal ordinal data with application to quality of life studies. Statistics in Medicine
27, 4359–4380). In this paper, we propose a Kronecker product (KP) covariance structure to capture the correlation between processes at a given time and the correlation within a process over time (serial correlation) for bivariate longitudinal ordinal data. For the latter, we consider a more general class of models than standard (first-order) autoregressive correlation models, by re-parameterizing the correlation matrix using partial autocorrelations (Daniels and Pourahmadi, 2009). Modeling covariance matrices via partial autocorrelations. Journal of Multivariate Analysis
100, 2352–2363). We assess the reasonableness of the KP structure with a score test. A maximum marginal likelihood estimation method is proposed utilizing a quasi-Newton algorithm with quasi-Monte Carlo integration of the random effects. We examine the effects of demographic factors on metabolic syndrome and C-reactive protein using the proposed models.
Kronecker product; Metabolic syndrome; Partial autocorrelation
In longitudinal clinical trials, if a subject drops out due to death, certain responses, such as those measuring quality of life (QOL), will not be defined after the time of death. Thus, standard missing data analyses, e.g., under ignorable dropout, are problematic because these approaches implicitly ‘impute’ values of the response after death. In this paper we define a new survivors average causal effect for a bivariate response in a longitudinal quality of life study that had a high dropout rate with the dropout often due to death (or tumor progression). We show how principal stratification, with a few sensitivity parameters, can be used to draw causal inferences about the joint distribution of these two ordinal quality of life measures.
Joint modeling of longitudinal and survival data has been increasingly considered in clinical trials, notably in cancer and AIDS. In critically ill patients admitted to an intensive care unit (ICU), such models also appear to be of interest in the investigation of the effect of treatment on severity scores due to the likely association between the longitudinal score and the dropout process, either caused by deaths or live discharges from the ICU. However, in this competing risk setting, only cause-specific hazard sub-models for the multiple failure types data have been used.
We propose a joint model that consists of a linear mixed effects submodel for the longitudinal outcome, and a proportional subdistribution hazards submodel for the competing risks survival data, linked together by latent random effects. We use Markov chain Monte Carlo technique of Gibbs sampling to estimate the joint posterior distribution of the unknown parameters of the model. The proposed method is studied and compared to joint model with cause-specific hazards submodel in simulations and applied to a data set that consisted of repeated measurements of severity score and time of discharge and death for 1,401 ICU patients.
Time by treatment interaction was observed on the evolution of the mean SOFA score when ignoring potentially informative dropouts due to ICU deaths and live discharges from the ICU. In contrast, this was no longer significant when modeling the cause-specific hazards of informative dropouts. Such a time by treatment interaction persisted together with an evidence of treatment effect on the hazard of death when modeling dropout processes through the use of the Fine and Gray model for sub-distribution hazards.
In the joint modeling of competing risks with longitudinal response, differences in the handling of competing risk outcomes appear to translate into the estimated difference in treatment effect on the longitudinal outcome. Such a modeling strategy should be carefully defined prior to analysis.
In clinical studies, longitudinal biomarkers are often used to monitor disease progression and failure time. Joint modeling of longitudinal and survival data has certain advantages and has emerged as an effective way to mutually enhance information. Typically, a parametric longitudinal model is assumed to facilitate the likelihood approach. However, the choice of a proper parametric model turns out to be more elusive than models for standard longitudinal studies in which no survival endpoint occurs. In this article, we propose a nonparametric multiplicative random effects model for the longitudinal process, which has many applications and leads to a flexible yet parsimonious nonparametric random effects model. A proportional hazards model is then used to link the biomarkers and event time. We use B-splines to represent the nonparametric longitudinal process, and select the number of knots and degrees based on a version of the Akaike information criterion (AIC). Unknown model parameters are estimated through maximizing the observed joint likelihood, which is iteratively maximized by the Monte Carlo Expectation Maximization (MCEM) algorithm. Due to the simplicity of the model structure, the proposed approach has good numerical stability and compares well with the competing parametric longitudinal approaches. The new approach is illustrated with primary biliary cirrhosis (PBC) data, aiming to capture nonlinear patterns of serum bilirubin time courses and their relationship with survival time of PBC patients.
B-splines; EM algorithm; Functional data analysis; Missing data; Monte Carlo integration
Generalized linear models with serial dependence are often used for short longitudinal series. Heagerty (2002, Biometrics 58, 342–351) has proposed marginalized transition models for the analysis of longitudinal binary data. In this article, we extend this work to accommodate longitudinal ordinal data. Fisher-scoring algorithms are developed for estimation. Methods are illustrated on quality-of-life data from a recent colorectal cancer clinical trial.
Fisher scoring; Generalized linear models; QOL
Health-related quality of life (HRQL) has become an increasingly important outcome parameter in clinical trials and epidemiological research. HRQL scores are typically bounded at both ends of the scale and often highly skewed. Several regression techniques have been proposed to model such data in cross-sectional studies, however, methods applicable in longitudinal research are less well researched. This study examined the use of beta regression models for analyzing longitudinal HRQL data using two empirical examples with distributional features typically encountered in practice.
We used SF-6D utility data from a German older age cohort study and stroke-specific HRQL data from a randomized controlled trial. We described the conceptual differences between mixed and marginal beta regression models and compared both models to the commonly used linear mixed model in terms of overall fit and predictive accuracy.
At any measurement time, the beta distribution fitted the SF-6D utility data and stroke-specific HRQL data better than the normal distribution. The mixed beta model showed better likelihood-based fit statistics than the linear mixed model and respected the boundedness of the outcome variable. However, it tended to underestimate the true mean at the upper part of the distribution. Adjusted group means from marginal beta model and linear mixed model were nearly identical but differences could be observed with respect to standard errors.
Understanding the conceptual differences between mixed and marginal beta regression models is important for their proper use in the analysis of longitudinal HRQL data. Beta regression fits the typical distribution of HRQL data better than linear mixed models, however, if focus is on estimating group mean scores rather than making individual predictions, the two methods might not differ substantially.
Health-related quality of life; Beta regression; Longitudinal study; Mixed model; Marginal model
Palliative medicine is a relatively new specialty that focuses on
preventing and relieving the suffering of patients facing life-threatening
illness. For cancer patients, clinical trials have been carried out to compare
concurrent palliative care with usual cancer care in terms of longitudinal
measurements of quality of life (QOL) until death, and overall survival is
usually treated as a secondary endpoint. It is known that QOL of patients with
advanced cancer decreases as death approaches; however in previous clinical
trials this association has generally not been taken into account when
inferences about the effect of an intervention on QOL or survival have been
made. We developed a new joint modeling approach, a terminal decline model, to
study the trajectory of repeated measurements and survival in a recently
completed palliative care study. This approach takes the association of survival
and QOL into account by modeling QOL retrospectively from death. For those
patients whose death times are censored, marginal likelihood is used to
incorporate them into the analysis. Our approach has two submodels: a piecewise
linear random intercept model with serial correlation and measurement error for
the retrospective trajectory of QOL; and a piecewise exponential model for the
survival distribution. Maximum likelihood estimators of the parameters are
obtained by maximizing the closed-form expression of log-likelihood function. An
explicit expression of quality-adjusted life years can also be derived from our
approach. A detailed data analysis of our previously reported palliative care
randomized clinical trial is presented.
end of life; joint modeling; palliative care; quality of life; survival; terminal decline
In medical and biomedical areas, binary and binomial outcomes are very common. Such data are often collected longitudinally from a given subject repeatedly overtime, which result in clustering of the observations within subjects, leading to correlation, on the one hand. The repeated binary outcomes from a given subject, on the other hand, constitute a binomial outcome, where the prescribed mean-variance relationship is often violated, leading to the so-called overdispersion.
Two longitudinal binary data sets, collected in south western Ethiopia: the Jimma infant growth study, where the child’s early growth is studied, and the Jimma longitudinal family survey of youth where the adolescent’s school attendance is studied over time, are considered. A new model which combines both overdispersion, and correlation simultaneously, also known as the combined model is applied. In addition, the commonly used methods for binary and binomial data, such as the simple logistic, which accounts neither for the overdispersion nor the correlation, the beta-binomial model, and the logistic-normal model, which accommodate only for the overdispersion, and correlation, respectively, are also considered for comparison purpose. As an alternative estimation technique, a Bayesian implementation of the combined model is also presented.
The combined model results in model improvement in fit, and hence the preferred one, based on likelihood comparison, and DIC criterion. Further, the two estimation approaches result in fairly similar parameter estimates and inferences in both of our case studies. Early initiation of breastfeeding has a protective effect against the risk of overweight in late infancy (p = 0.001), while proportion of overweight seems to be invariant among males and females overtime (p = 0.66). Gender is significantly associated with school attendance, where girls have a lower rate of attendance (p = 0.001) as compared to boys.
We applied a flexible modeling framework to analyze binary and binomial longitudinal data. Instead of accounting for overdispersion, and correlation separately, both can be accommodated simultaneously, by allowing two separate sets of the beta, and the normal random effects at once.
Bernoulli model; Beta-binomial model; Binomial model; Logistic-normal model; Maximum likelihood
Longitudinal studies with binary repeated measures are widespread in biomedical research. Marginal regression approaches for balanced binary data are well developed, while for binary process data, where measurement times are irregular and may differ by individuals, likelihood-based methods for marginal regression analysis are less well developed. In this article, we develop a Bayesian regression model for analyzing longitudinal binary process data, with emphasis on dealing with missingness. We focus on the settings where data are missing at random, which require a correctly specified joint distribution for the repeated measures in order to draw valid likelihood-based inference about the marginal mean. To provide maximum flexibility, the proposed model specifies both the marginal mean and serial dependence structures using nonparametric smooth functions. Serial dependence is allowed to depend on the time lag between adjacent outcomes as well as other relevant covariates. Inference is fully Bayesian. Using simulations, we show that adequate modeling of the serial dependence structure is necessary for valid inference of the marginal mean when the binary process data are missing at random. Longitudinal viral load data from the HIV Epidemiology Research Study (HERS) are analyzed for illustration.
Repeated measures; Marginal model; Nonparametric regression; Penalized splines; HIV/AIDS; Antiviral treatment
Diverse analysis approaches have been proposed to distinguish data missing due to death from nonresponse, and to summarize trajectories of longitudinal data truncated by death. We demonstrate how these analysis approaches arise from factorizations of the distribution of longitudinal data and survival information. Models are illustrated using cognitive functioning data for older adults. For unconditional models, deaths do not occur, deaths are independent of the longitudinal response, or the unconditional longitudinal response is averaged over the survival distribution. Unconditional models, such as random effects models fit to unbalanced data, may implicitly impute data beyond the time of death. Fully conditional models stratify the longitudinal response trajectory by time of death. Fully conditional models are effective for describing individual trajectories, in terms of either aging (age, or years from baseline) or dying (years from death). Causal models (principal stratification) as currently applied are fully conditional models, since group differences at one timepoint are described for a cohort that will survive past a later timepoint. Partly conditional models summarize the longitudinal response in the dynamic cohort of survivors. Partly conditional models are serial cross-sectional snapshots of the response, reflecting the average response in survivors at a given timepoint rather than individual trajectories. Joint models of survival and longitudinal response describe the evolving health status of the entire cohort. Researchers using longitudinal data should consider which method of accommodating deaths is consistent with research aims, and use analysis methods accordingly.
Censoring; Generalized estimating equations; Longitudinal data; Missing data; Quality of life; Random effects models; Truncation by death
Two models for the analysis of longitudinal binary data are discussed: the marginal model and the random intercepts model. In contrast to the linear mixed model (LMM), the two models for binary data are not subsumed under a single hierarchical model. The marginal model provides group-level information whereas the random intercepts model provides individual-level information including information about heterogeneity of growth. It is shown how a type of numerical averaging can be used with the random intercepts model to obtain group-level information, thus approximating individual and marginal aspects of the LMM. The types of inferences associated with each model are illustrated with longitudinal criminal offending data based on N = 506 males followed over a 22-year period. Violent offending indexed by official records and self-report were analyzed, with the marginal model estimated using generalized estimating equations and the random intercepts model estimated using maximum likelihood. The results show that the numerical averaging based on the random intercepts can produce prediction curves almost identical to those obtained directly from the marginal model parameter estimates. The results provide a basis for contrasting the models and the estimation procedures and key features are discussed to aid in selecting a method for empirical analysis.
Dropout is a common occurrence in longitudinal studies. Building upon the pattern-mixture modeling approach within the Bayesian paradigm, we propose a general framework of varying-coefficient models for longitudinal data with informative dropout, where measurement times can be irregular and dropout can occur at any point in continuous time (not just at observation times) together with administrative censoring. Specifically, we assume that the longitudinal outcome process depends on the dropout process through its model parameters. The unconditional distribution of the repeated measures is a mixture over the dropout (administrative censoring) time distribution, and the continuous dropout time distribution with administrative censoring is left completely unspecified. We use Markov chain Monte Carlo to sample from the posterior distribution of the repeated measures given the dropout (administrative censoring) times; Bayesian bootstrapping on the observed dropout (administrative censoring) times is carried out to obtain marginal covariate effects. We illustrate the proposed framework using data from a longitudinal study of depression in HIV-infected women; the strategy for sensitivity analysis on unverifiable assumption is also demonstrated.
HIV/AIDS; Missing data; Nonparametric regression; Penalized splines
We implement a joint model for mixed multivariate longitudinal measurements, applied to the prediction of time until lung transplant or death in idiopathic pulmonary fibrosis. Specifically, we formulate a unified Bayesian joint model for the mixed longitudinal responses and time-to-event outcomes. For the longitudinal model of continuous and binary responses, we investigate multivariate generalized linear mixed models using shared random effects. Longitudinal and time-to-event data are assumed to be independent conditional on available covariates and shared parameters. A Markov chain Monte Carlo (MCMC) algorithm, implemented in OpenBUGS, is used for parameter estimation. To illustrate practical considerations in choosing a final model, we fit 37 different candidate models using all possible combinations of random effects and employ a Deviance Information Criterion (DIC) to select a best fitting model. We demonstrate the prediction of future event probabilities within a fixed time interval for patients utilizing baseline data, post-baseline longitudinal responses, and the time-to-event outcome. The performance of our joint model is also evaluated in simulation studies.
Idiopathic Pulmonary Fibrosis; Joint model; Mixed continuous and binary data; Multivariate longitudinal data; Prediction model; Shared parameter model; Survival analysis
Existing joint models for longitudinal and survival data are not applicable for longitudinal ordinal outcomes with possible non-ignorable missing values caused by multiple reasons. We propose a joint model for longitudinal ordinal measurements and competing risks failure time data, in which a partial proportional odds model for the longitudinal ordinal outcome is linked to the event times by latent random variables. At the survival endpoint, our model adopts the competing risks framework to model multiple failure types at the same time. The partial proportional odds model, as an extension of the popular proportional odds model for ordinal outcomes, is more flexible and at the same time provides a tool to test the proportional odds assumption. We use a likelihood approach and derive an EM algorithm to obtain the maximum likelihood estimates of the parameters. We further show that all the parameters at the survival endpoint are identifiable from the data. Our joint model enables one to make inference for both the longitudinal ordinal outcome and the failure times simultaneously. In addition, the inference at the longitudinal endpoint is adjusted for possible non-ignorable missing data caused by the failure times. We apply the method to the NINDS rt-PA stroke trial. Our study considers the modified Rankin Scale only. Other ordinal outcomes in the trial, such as the Barthel and Glasgow scales can be treated in the same way.
In a unique longitudinal study of teen driving, risky driving behavior and the occurrence of crashes or near crashes are measured prospectively over the first 18 months of licensure. Of scientific interest is relating the two processes and developing a predictor of crashes from previous risky driving behavior. In this work, we propose two latent class models for relating risky driving behavior to the occurrence of a crash or near crash event. The first approach models the binary longitudinal crash/near crash outcome using a binary latent variable which depends on risky driving covariates and previous outcomes. A random effects model introduces heterogeneity among subjects in modeling the mean value of the latent state. The second approach extends the first model to the ordinal case where the latent state is composed of K ordinal classes. Additionally, we discuss an alternate hidden Markov model formulation. Estimation is performed using the expectation-maximization (EM) algorithm and Monte Carlo EM. We illustrate the importance of using these latent class modeling approaches through the analysis of the teen driving behavior.
driving study; latent class modeling; Monte Carlo EM
Random-effects change point models are formulated for longitudinal data obtained from cognitive tests. The conditional distribution of the response variable in a change point model is often assumed to be normal even if the response variable is discrete and shows ceiling effects. For the sum score of a cognitive test, the binomial and the beta-binomial distributions are presented as alternatives to the normal distribution. Smooth shapes for the change point models are imposed. Estimation is by marginal maximum likelihood where a parametric population distribution for the random change point is combined with a non-parametric mixing distribution for other random effects. An extension to latent class modelling is possible in case some individuals do not experience a change in cognitive ability. The approach is illustrated using data from a longitudinal study of Swedish octogenarians and nonagenarians that began in 1991. Change point models are applied to investigate cognitive change in the years before death.
Beta-binomial distribution; Latent class model; Mini-mental state examination; Random-effects model
Estimation of variance components by Monte Carlo (MC) expectation maximization (EM) restricted maximum likelihood (REML) is computationally efficient for large data sets and complex linear mixed effects models. However, efficiency may be lost due to the need for a large number of iterations of the EM algorithm. To decrease the computing time we explored the use of faster converging Newton-type algorithms within MC REML implementations. The implemented algorithms were: MC Newton-Raphson (NR), where the information matrix was generated via sampling; MC average information(AI), where the information was computed as an average of observed and expected information; and MC Broyden's method, where the zero of the gradient was searched using a quasi-Newton-type algorithm. Performance of these algorithms was evaluated using simulated data. The final estimates were in good agreement with corresponding analytical ones. MC NR REML and MC AI REML enhanced convergence compared to MC EM REML and gave standard errors for the estimates as a by-product. MC NR REML required a larger number of MC samples, while each MC AI REML iteration demanded extra solving of mixed model equations by the number of parameters to be estimated. MC Broyden's method required the largest number of MC samples with our small data and did not give standard errors for the parameters directly. We studied the performance of three different convergence criteria for the MC AI REML algorithm. Our results indicate the importance of defining a suitable convergence criterion and critical value in order to obtain an efficient Newton-type method utilizing a MC algorithm. Overall, use of a MC algorithm with Newton-type methods proved feasible and the results encourage testing of these methods with different kinds of large-scale problem settings.
Time index-ordered random variables are said to be antedependent (AD) of order (p1, p2, …, pn) if the kth variable, conditioned on the pk immediately preceding variables, is independent of all further preceding variables. Inferential methods associated with AD models are well developed for continuous (primarily normal) longitudinal data, but not for categorical longitudinal data. In this article, we develop likelihood-based inferential procedures for unstructured AD models for categorical longitudinal data. Specifically, we derive maximum likelihood estimators (mles) of model parameters; penalized likelihood criteria and likelihood ratio tests for determining the order of antedependence; and likelihood ratio tests for homogeneity across groups, time-invariance of transition probabilities, and strict stationarity. Closed-form expressions for mles and test statistics, which allow for the possibility of empty cells and monotone missing data, are given for all cases save strict stationarity. For data with an arbitrary missingness pattern, we derive an efficient restricted EM algorithm for obtaining mles. The performance of the tests is evaluated by simulation. The methods are applied to longitudinal studies of toenail infection severity (measured on a binary scale) and Alzheimer’s disease severity (measured on an ordinal scale). The analysis of the toenail infection severity data reveals interesting nonstationary behavior of the transition probabilities and indicates that an unstructured first-order AD model is superior to stationary and other structured first-order AD models that have previously been fit to these data. The analysis of the Alzheimer’s severity data indicates that the antedependence is second-order with time-invariant transition probabilities, suggesting the use of a second-order autoregressive cumulative logit model.
Likelihood ratio test; Markov models; Missing data; Transition models
Understanding human sexual behaviors is essential for the effective prevention of sexually transmitted infections. Analysis of longitudinally measured sexual behavioral data, however, is often complicated by zero-inflation of event counts, nonlinear time trend, time-varying covariates, and informative dropouts. Ignoring these complicating factors could undermine the validity of the study findings. In this paper, we put forth a unified joint modeling structure that accommodates these features of the data. Specifically, we propose a pair of simultaneous models for the zero-inflated event counts: Each of these models contains an auto-regressive structure for the accommodation of the effect of recent event history, and a nonparametric component for the modeling of nonlinear time effect. Informative dropout and time varying covariates are modeled explicitly in the process. Model fitting and parameter estimation are carried out in a Bayesian paradigm by the use of a Markov Chain Monte Carlo (MCMC) method. Analytical results showed that adolescent sexual behaviors tended to evolve nonlinearly over time and they were strongly influenced by the day-to-day variations in mood and sexual interests. These findings suggest that adolescent sex is to a large extent driven by intrinsic factors rather than being compelled by circumstances, thus highlighting the need of education on self protective measures against infection risks.
Joint modeling; Markov Chain Monte Carlo; Mood; Sexually transmitted infections; Zero-inflated Poisson
Many randomized clinical trials collect multivariate longitudinal measurements in different scales, for example, binary, ordinal, and continuous. Multilevel item response models are used to evaluate the global treatment effects across multiple outcomes while accounting for all sources of correlation. Continuous measurements are often assumed to be normally distributed. But the model inference is not robust when the normality assumption is violated because of heavy tails and outliers. In this article, we develop a Bayesian method for multilevel item response models replacing the normal distributions with symmetric heavy-tailed normal/independent distributions. The inference is conducted using a Bayesian framework via Markov Chain Monte Carlo simulation implemented in BUGS language. Our proposed method is evaluated by simulation studies and is applied to Earlier versus Later Levodopa Therapy in Parkinson’s Disease study, a motivating clinical trial assessing the effect of Levodopa therapy on the Parkinson’s disease progression rate.
item response theory; latent variable; Markov Chain Monte Carlo; robust inference; clinical trial
Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models.
We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC.
Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted.
The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient.
On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain.
It is of interest to estimate the distribution of usual nutrient intake for a population from repeat 24-h dietary recall assessments. A mixed effects model and quantile estimation procedure, developed at the National Cancer Institute (NCI), may be used for this purpose. The model incorporates a Box–Cox parameter and covariates to estimate usual daily intake of nutrients; model parameters are estimated via quasi-Newton optimization of a likelihood approximated by the adaptive Gaussian quadrature. The parameter estimates are used in a Monte Carlo approach to generate empirical quantiles; standard errors are estimated by bootstrap. The NCI method is illustrated and compared with current estimation methods, including the individual mean and the semi-parametric method developed at the Iowa State University (ISU), using data from a random sample and computer simulations. Both the NCI and ISU methods for nutrients are superior to the distribution of individual means. For simple (no covariate) models, quantile estimates are similar between the NCI and ISU methods. The bootstrap approach used by the NCI method to estimate standard errors of quantiles appears preferable to Taylor linearization. One major advantage of the NCI method is its ability to provide estimates for subpopulations through the incorporation of covariates into the model. The NCI method may be used for estimating the distribution of usual nutrient intake for populations and subpopulations as part of a unified framework of estimation of usual intake of dietary constituents.
statistical distributions; diet surveys; nutrition assessment; mixed-effects model; nutrients; percentiles
Previous research has compared methods of estimation for multilevel models fit to binary data but there are reasons to believe that the results will not always generalize to the ordinal case. This paper thus evaluates (a) whether and when fitting multilevel linear models to ordinal outcome data is justified and (b) which estimator to employ when instead fitting multilevel cumulative logit models to ordinal data, Maximum Likelihood (ML) or Penalized Quasi-Likelihood (PQL). ML and PQL are compared across variations in sample size, magnitude of variance components, number of outcome categories, and distribution shape. Fitting a multilevel linear model to ordinal outcomes is shown to be inferior in virtually all circumstances. PQL performance improves markedly with the number of ordinal categories, regardless of distribution shape. In contrast to binary data, PQL often performs as well as ML when used with ordinal data. Further, the performance of PQL is typically superior to ML when the data includes a small to moderate number of clusters (i.e., ≤ 50 clusters).
Multilevel Models; Random Effects; Ordinal; Categorical; Cumulative Logit Model; Proportional Odds Model
Longitudinal studies of a binary outcome are common in the health, social, and behavioral sciences. In general, a feature of random effects logistic regression models for longitudinal binary data is that the marginal functional form, when integrated over the distribution of the random effects, is no longer of logistic form. Recently, Wang and Louis (2003) proposed a random intercept model in the clustered binary data setting where the marginal model has a logistic form. An acknowledged limitation of their model is that it allows only a single random effect that varies from cluster to cluster. In this paper, we propose a modification of their model to handle longitudinal data, allowing separate, but correlated, random intercepts at each measurement occasion. The proposed model allows for a flexible correlation structure among the random intercepts, where the correlations can be interpreted in terms of Kendall’s τ. For example, the marginal correlations among the repeated binary outcomes can decline with increasing time separation, while the model retains the property of having matching conditional and marginal logit link functions. Finally, the proposed method is used to analyze data from a longitudinal study designed to monitor cardiac abnormalities in children born to HIV-infected women.
Correlated binary data; multivariate normal distribution; probability integral transformation
For the analysis of the longitudinal hypertension family data, we focused on modeling binary traits of hypertension measured repeatedly over time. Our primary objective is to examine predictive abilities of longitudinal models for genetic associations. We first identified single-nucleotide polymorphisms (SNPs) associated with any occurrence of hypertension over the study period to set up covariates for the longitudinal analysis. Then, we proceeded to the longitudinal analysis of the repeated measures of binary hypertension with covariates including SNPs by accounting for correlations arising from repeated outcomes and among family members.
We examined two popular models for longitudinal binary outcomes: (a) a marginal model based on the generalized estimating equations, and (b) a conditional model based on the logistic random effect model. The effects of risk factors associated with repeated hypertensions were compared for these two models and their prediction abilities were assessed with and without genetic information.
Based on both approaches, we found a significant interaction effect between age and gender where males were at higher risk of hypertension before age 35 years, but after age 35 years, women were at higher risk. Moreover, the SNPs were significantly associated with hypertension after adjusting for age, gender, and smoking status. The SNPs contributed more to predict hypertension in the marginal model than in the conditional model. There was substantial correlation among repeated measures of hypertension, implying that hypertension was considerably correlated with previous experience of hypertension. The conditional model performed better for predicting the future hypertension status of individuals.