# Related Articles

Random effects models are commonly used to analyze longitudinal categorical data. Marginalized random effects models are a class of models that permit direct estimation of marginal mean parameters and characterize serial correlation for longitudinal categorical data via random effects (Heagerty, 1999). Marginally specified logistic-normal models for longitudinal binary data. Biometrics
55, 688–698; Lee and Daniels, 2008. Marginalized models for longitudinal ordinal data with application to quality of life studies. Statistics in Medicine
27, 4359–4380). In this paper, we propose a Kronecker product (KP) covariance structure to capture the correlation between processes at a given time and the correlation within a process over time (serial correlation) for bivariate longitudinal ordinal data. For the latter, we consider a more general class of models than standard (first-order) autoregressive correlation models, by re-parameterizing the correlation matrix using partial autocorrelations (Daniels and Pourahmadi, 2009). Modeling covariance matrices via partial autocorrelations. Journal of Multivariate Analysis
100, 2352–2363). We assess the reasonableness of the KP structure with a score test. A maximum marginal likelihood estimation method is proposed utilizing a quasi-Newton algorithm with quasi-Monte Carlo integration of the random effects. We examine the effects of demographic factors on metabolic syndrome and C-reactive protein using the proposed models.

doi:10.1093/biostatistics/kxs058

PMCID: PMC3677737
PMID: 23365416

Kronecker product; Metabolic syndrome; Partial autocorrelation

SUMMARY

Random effects are often used in generalized linear models to explain the serial dependence for longitudinal categorical data. Marginalized random effects models (MREMs) for the analysis of longitudinal binary data have been proposed to permit likelihood-based estimation of marginal regression parameters. In this paper, we introduce an extension of the MREM to accommodate longitudinal ordinal data. Maximum marginal likelihood estimation is implemented utilizing quasi-Newton algorithms with Monte Carlo integration of the random effects. Our approach is applied to analyze the quality of life data from a recent colorectal cancer clinical trial. Dropout occurs at a high rate and is often due to tumor progression or death. To deal with progression/death, we use a mixture model for the joint distribution of longitudinal measures and progression/death times and principal stratification to draw causal inferences about survivors.

doi:10.1002/sim.3352

PMCID: PMC2858760
PMID: 18613246

marginalized likelihood-based models; ordinal data models; dropout

SUMMARY

In a unique longitudinal study of teen driving, risky driving behavior and the occurrence of crashes or near crashes are measured prospectively over the first 18 months of licensure. Of scientific interest is relating the two processes and developing a predictor of crashes from previous risky driving behavior. In this work, we propose two latent class models for relating risky driving behavior to the occurrence of a crash or near crash event. The first approach models the binary longitudinal crash/near crash outcome using a binary latent variable which depends on risky driving covariates and previous outcomes. A random effects model introduces heterogeneity among subjects in modeling the mean value of the latent state. The second approach extends the first model to the ordinal case where the latent state is composed of K ordinal classes. Additionally, we discuss an alternate hidden Markov model formulation. Estimation is performed using the expectation-maximization (EM) algorithm and Monte Carlo EM. We illustrate the importance of using these latent class modeling approaches through the analysis of the teen driving behavior.

doi:10.1111/j.1467-9876.2012.01065.x

PMCID: PMC4183151
PMID: 25284899

driving study; latent class modeling; Monte Carlo EM

A typical longitudinal study prospectively collects both repeated measures of a health status outcome as well as covariates that are used either as the primary predictor of interest or as important adjustment factors. In many situations, all covariates are measured on the entire study cohort. However, in some scenarios the primary covariates are time dependent yet may be ascertained retrospectively after completion of the study. One common example would be covariate measurements based on stored biological specimens such as blood plasma. While authors have previously proposed generalizations of the standard case–control design in which the clustered outcome measurements are used to selectively ascertain covariates (Neuhaus and Jewell, 1990) and therefore provide resource efficient collection of information, these designs do not appear to be commonly used. One potential barrier to the use of longitudinal outcome-dependent sampling designs would be the lack of a flexible class of likelihood-based analysis methods. With the relatively recent development of flexible and practical methods such as generalized linear mixed models (Breslow and Clayton, 1993) and marginalized models for categorical longitudinal data (see Heagerty and Zeger, 2000, for an overview), the class of likelihood-based methods is now sufficiently well developed to capture the major forms of longitudinal correlation found in biomedical repeated measures data. Therefore, the goal of this manuscript is to promote the consideration of outcome-dependent longitudinal sampling designs and to both outline and evaluate the basic conditional likelihood analysis allowing for valid statistical inference.

doi:10.1093/biostatistics/kxn006

PMCID: PMC2733177
PMID: 18372397

Binary data; Longitudinal data analysis; Marginal models; Marginalized models; Outcome-dependent sampling; Time-dependent covariates

Summary

Longitudinal studies with binary repeated measures are widespread in biomedical research. Marginal regression approaches for balanced binary data are well developed, while for binary process data, where measurement times are irregular and may differ by individuals, likelihood-based methods for marginal regression analysis are less well developed. In this article, we develop a Bayesian regression model for analyzing longitudinal binary process data, with emphasis on dealing with missingness. We focus on the settings where data are missing at random, which require a correctly specified joint distribution for the repeated measures in order to draw valid likelihood-based inference about the marginal mean. To provide maximum flexibility, the proposed model specifies both the marginal mean and serial dependence structures using nonparametric smooth functions. Serial dependence is allowed to depend on the time lag between adjacent outcomes as well as other relevant covariates. Inference is fully Bayesian. Using simulations, we show that adequate modeling of the serial dependence structure is necessary for valid inference of the marginal mean when the binary process data are missing at random. Longitudinal viral load data from the HIV Epidemiology Research Study (HERS) are analyzed for illustration.

doi:10.1002/sim.3265

PMCID: PMC2581820
PMID: 18351709

Repeated measures; Marginal model; Nonparametric regression; Penalized splines; HIV/AIDS; Antiviral treatment

Time index-ordered random variables are said to be antedependent (AD) of order (p1, p2, …, pn) if the kth variable, conditioned on the pk immediately preceding variables, is independent of all further preceding variables. Inferential methods associated with AD models are well developed for continuous (primarily normal) longitudinal data, but not for categorical longitudinal data. In this article, we develop likelihood-based inferential procedures for unstructured AD models for categorical longitudinal data. Specifically, we derive maximum likelihood estimators (mles) of model parameters; penalized likelihood criteria and likelihood ratio tests for determining the order of antedependence; and likelihood ratio tests for homogeneity across groups, time-invariance of transition probabilities, and strict stationarity. Closed-form expressions for mles and test statistics, which allow for the possibility of empty cells and monotone missing data, are given for all cases save strict stationarity. For data with an arbitrary missingness pattern, we derive an efficient restricted EM algorithm for obtaining mles. The performance of the tests is evaluated by simulation. The methods are applied to longitudinal studies of toenail infection severity (measured on a binary scale) and Alzheimer’s disease severity (measured on an ordinal scale). The analysis of the toenail infection severity data reveals interesting nonstationary behavior of the transition probabilities and indicates that an unstructured first-order AD model is superior to stationary and other structured first-order AD models that have previously been fit to these data. The analysis of the Alzheimer’s severity data indicates that the antedependence is second-order with time-invariant transition probabilities, suggesting the use of a second-order autoregressive cumulative logit model.

doi:10.1002/sim.5763

PMCID: PMC3885186
PMID: 23436682

Likelihood ratio test; Markov models; Missing data; Transition models

SUMMARY

We implement a joint model for mixed multivariate longitudinal measurements, applied to the prediction of time until lung transplant or death in idiopathic pulmonary fibrosis. Specifically, we formulate a unified Bayesian joint model for the mixed longitudinal responses and time-to-event outcomes. For the longitudinal model of continuous and binary responses, we investigate multivariate generalized linear mixed models using shared random effects. Longitudinal and time-to-event data are assumed to be independent conditional on available covariates and shared parameters. A Markov chain Monte Carlo (MCMC) algorithm, implemented in OpenBUGS, is used for parameter estimation. To illustrate practical considerations in choosing a final model, we fit 37 different candidate models using all possible combinations of random effects and employ a Deviance Information Criterion (DIC) to select a best fitting model. We demonstrate the prediction of future event probabilities within a fixed time interval for patients utilizing baseline data, post-baseline longitudinal responses, and the time-to-event outcome. The performance of our joint model is also evaluated in simulation studies.

doi:10.1080/02664763.2014.909784

PMCID: PMC4157686
PMID: 25214700

Idiopathic Pulmonary Fibrosis; Joint model; Mixed continuous and binary data; Multivariate longitudinal data; Prediction model; Shared parameter model; Survival analysis

SUMMARY

Existing joint models for longitudinal and survival data are not applicable for longitudinal ordinal outcomes with possible non-ignorable missing values caused by multiple reasons. We propose a joint model for longitudinal ordinal measurements and competing risks failure time data, in which a partial proportional odds model for the longitudinal ordinal outcome is linked to the event times by latent random variables. At the survival endpoint, our model adopts the competing risks framework to model multiple failure types at the same time. The partial proportional odds model, as an extension of the popular proportional odds model for ordinal outcomes, is more flexible and at the same time provides a tool to test the proportional odds assumption. We use a likelihood approach and derive an EM algorithm to obtain the maximum likelihood estimates of the parameters. We further show that all the parameters at the survival endpoint are identifiable from the data. Our joint model enables one to make inference for both the longitudinal ordinal outcome and the failure times simultaneously. In addition, the inference at the longitudinal endpoint is adjusted for possible non-ignorable missing data caused by the failure times. We apply the method to the NINDS rt-PA stroke trial. Our study considers the modified Rankin Scale only. Other ordinal outcomes in the trial, such as the Barthel and Glasgow scales can be treated in the same way.

doi:10.1002/sim.3798

PMCID: PMC2822130
PMID: 19943331

Within the pattern-mixture modeling framework for informative dropout, conditional linear models (CLMs) are a useful approach to deal with dropout that can occur at any point in continuous time (not just at observation times). However, in contrast with selection models, inferences about marginal covariate effects in CLMs are not readily available if nonidentity links are used in the mean structures. In this article, we propose a CLM for long series of longitudinal binary data with marginal covariate effects directly specified. The association between the binary responses and the dropout time is taken into account by modeling the conditional mean of the binary response as well as the dependence between the binary responses given the dropout time. Specifically, parameters in both the conditional mean and dependence models are assumed to be linear or quadratic functions of the dropout time; and the continuous dropout time distribution is left completely unspecified. Inference is fully Bayesian. We illustrate the proposed model using data from a longitudinal study of depression in HIV-infected women, where the strategy of sensitivity analysis based on the extrapolation method is also demonstrated.

doi:10.1093/biostatistics/kxr041

PMCID: PMC3297830
PMID: 22133756

Bayesian analysis; HIV/AIDS; Marginal model; Missing data; Sensitivity analysis

A major biomedical goal associated with evaluating a candidate biomarker or developing a predictive model score for event-time outcomes is to accurately distinguish between incident cases from the controls surviving beyond t throughout the entire study period. Extensions of standard binary classification measures like time-dependent sensitivity, specificity, and receiver operating characteristic (ROC) curves have been developed in this context (Heagerty, P. J., and others, 2000. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics
56, 337–344). We propose a direct, non-parametric method to estimate the time-dependent Area under the curve (AUC) which we refer to as the weighted mean rank (WMR) estimator. The proposed estimator performs well relative to the semi-parametric AUC curve estimator of Heagerty and Zheng (2005. Survival model predictive accuracy and ROC curves. Biometrics
61, 92–105). We establish the asymptotic properties of the proposed estimator and show that the accuracy of markers can be compared very simply using the difference in the WMR statistics. Estimators of pointwise standard errors are provided.

doi:10.1093/biostatistics/kxs021

PMCID: PMC3520498
PMID: 22734044

AUC curve; Survival analysis; Time-dependent ROC

Mixed-effects logistic regression models are described for analysis of longitudinal ordinal outcomes, where observations are observed clustered within subjects. Random effects are included in the model to account for the correlation of the clustered observations. Typically, the error variance and the variance of the random effects are considered to be homogeneous. These variance terms characterize the within-subjects (i.e., error variance) and between-subjects (i.e., random-effects variance) variation in the data. In this article, we describe how covariates can influence these variances, and also extend the standard logistic mixed model by adding a subject-level random effect to the within-subject variance specification. This permits subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their responses. Additionally, we allow the random effects to be correlated. We illustrate application of these models for ordinal data using Ecological Momentary Assessment (EMA) data, or intensive longitudinal data, from an adolescent smoking study. These mixed-effects ordinal location scale models have useful applications in mental health research where outcomes are often ordinal and there is interest in subject heterogeneity, both between- and within-subjects.

PMCID: PMC2847414
PMID: 20357914

Complex variation; Mood variation; Heterogeneity; Variance modeling

Background

Most epidemiological studies of major depression report period prevalence estimates. These are of limited utility in characterizing the longitudinal epidemiology of this condition. Markov models provide a methodological framework for increasing the utility of epidemiological data. Markov models relating incidence and recovery to major depression prevalence have been described in a series of prior papers. In this paper, the models are extended to describe the longitudinal course of the disorder.

Methods

Data from three national surveys conducted by the Canadian national statistical agency (Statistics Canada) were used in this analysis. These data were integrated using a Markov model. Incidence, recurrence and recovery were represented as weekly transition probabilities. Model parameters were calibrated to the survey estimates.

Results

The population was divided into three categories: low, moderate and high recurrence groups. The size of each category was approximated using lifetime data from a study using the WHO Mental Health Composite International Diagnostic Interview (WMH-CIDI). Consistent with previous work, transition probabilities reflecting recovery were high in the initial weeks of the episodes, and declined by a fixed proportion with each passing week.

Conclusion

Markov models provide a framework for integrating psychiatric epidemiological data. Previous studies have illustrated the utility of Markov models for decomposing prevalence into its various determinants: incidence, recovery and mortality. This study extends the Markov approach by distinguishing several recurrence categories.

doi:10.1186/1478-7954-3-11

PMCID: PMC1298330
PMID: 16288648

Depressive Disorder; Epidemiologic Methods; Markov Chain

Summary

To develop more targeted intervention strategies, an important research goal is to identify markers predictive of clinical events. A crucial step towards this goal is to characterize the clinical performance of a marker for predicting different types of events. In this manuscript, we present statistical methods for evaluating the performance of a prognostic marker in predicting multiple competing events. To capture the potential time-varying predictive performance of the marker and incorporate competing risks, we define time- and cause-specific accuracy summaries by stratifying cases based on causes of failure. Such definition would allow one to evaluate the predictive accuracy of a marker for each type of event and compare its predictiveness across event types. Extending the nonparametric crude cause-specific ROC curve estimators by Saha and Heagerty (2010), we develop inference procedures for a range of cause-specific accuracy summaries. To estimate the accuracy measures and assess how covariates may affect the accuracy of a marker under the competing risk setting, we consider two forms of semiparametric models through the cause-specific hazard framework. These approaches enable a flexible modeling of the relationships between the marker and failure times for each cause, while efficiently accommodating additional covariates. We investigate the asymptotic property of the proposed accuracy estimators and demonstrate the finite sample performance of these estimators through simulation studies. The proposed procedures are illustrated with data from a prostate cancer prognostic study.

doi:10.1111/j.1541-0420.2011.01671.x

PMCID: PMC3694786
PMID: 22150576

Biomarker evaluation; Cause-specific Hazard; Competing risk; Negative predictive value; Positive predictive value; Receiver Operating Characteristics Curve (ROC curve); Survival analysis

Background

Graphical techniques can provide visually compelling insights into complex data patterns. In this paper we present a type of lasagne plot showing changes in categorical variables for participants measured at regular intervals over time and propose statistical models to estimate distributions of marginal and transitional probabilities.

Methods

The plot uses stacked bars to show the distribution of categorical variables at each time interval, with different colours to depict different categories and changes in colours showing trajectories of participants over time. The models are based on nominal logistic regression which is appropriate for both ordinal and nominal categorical variables. To illustrate the plots and models we analyse data on smoking status, body mass index (BMI) and physical activity level from a longitudinal study on women’s health. To estimate marginal distributions we fit survey wave as an explanatory variable whereas for transitional distributions we fit status of participants (e.g. smoking status) at previous surveys.

Results

For the illustrative data the marginal models showed BMI increasing, physical activity decreasing and smoking decreasing linearly over time at the population level. The plots and transition models showed smoking status to be highly predictable for individuals whereas BMI was only moderately predictable and physical activity was virtually unpredictable. Most of the predictive power was obtained from participant status at the previous survey. Predicted probabilities from the models mostly agreed with observed probabilities indicating adequate goodness-of-fit.

Conclusions

The proposed form of lasagne plot provides a simple visual aid to show transitions in categorical variables over time in longitudinal studies. The suggested models complement the plot and allow formal testing and estimation of marginal and transitional distributions. These simple tools can provide valuable insights into categorical data on individuals measured at regular intervals over time.

doi:10.1186/1471-2288-14-32

PMCID: PMC3938907
PMID: 24576041

Categorical variables; Graphical methods; Longitudinal studies; Marginal distribution; Nominal regression; Transition probabilities

Summary

In this article we study a joint model for longitudinal measurements and competing risks survival data. Our joint model provides a flexible approach to handle possible nonignorable missing data in the longitudinal measurements due to dropout. It is also an extension of previous joint models with a single failure type, offering a possible way to model informatively censored events as a competing risk. Our model consists of a linear mixed effects submodel for the longitudinal outcome and a proportional cause-specific hazards frailty submodel (Prentice et al., 1978, Biometrics 34, 541-554) for the competing risks survival data, linked together by some latent random effects. We propose to obtain the maximum likelihood estimates of the parameters by an expectation maximization (EM) algorithm and estimate their standard errors using a profile likelihood method. The developed method works well in our simulation studies and is applied to a clinical trial for the scleroderma lung disease.

doi:10.1111/j.1541-0420.2007.00952.x

PMCID: PMC2751647
PMID: 18162112

Cause-specific hazard; Competing risks; EM algorithm; Joint modeling; Longitudinal data; Mixed effects model

Recurrent event data are frequently encountered in studies with longitudinal designs. Let the recurrence time be the time between two successive recurrent events. Recurrence times can be treated as a type of correlated survival data in statistical analysis. In general, because of the ordinal nature of recurrence times, statistical methods that are appropriate for standard correlated survival data in marginal models may not be applicable to recurrence time data. Specifically, for estimating the marginal survival function, the Kaplan-Meier estimator derived from the pooled recurrence times serves as a consistent estimator for standard correlated survival data but not for recurrence time data. In this article we consider the problem of how to estimate the marginal survival function in nonparametric models. A class of nonparametric estimators is introduced. The appropriateness of the estimators is confirmed by statistical theory and simulations. Simulation and analysis from schizophrenia data are presented to illustrate the estimators' performance.

PMCID: PMC3826567
PMID: 24244058

Correlated survival data; Frailty; Kaplan-Meier estimate; Longitudinal designs; Recurrent event

Summary

In this article we consider the problem of fitting pattern mixture models to longitudinal data when there are many unique dropout times. We propose a marginally specified latent class pattern mixture model. The marginal mean is assumed to follow a generalized linear model, whereas the mean conditional on the latent class and random effects is specified separately. Because the dimension of the parameter vector of interest (the marginal regression coefficients) does not depend on the assumed number of latent classes, we propose to treat the number of latent classes as a random variable. We specify a prior distribution for the number of classes, and calculate (approximate) posterior model probabilities. In order to avoid the complications with implementing a fully Bayesian model, we propose a simple approximation to these posterior probabilities. The ideas are illustrated using data from a longitudinal study of depression in HIV-infected women.

doi:10.1111/j.1541-0420.2007.00884.x

PMCID: PMC2791415
PMID: 17900312

Bayesian model averaging; Incomplete data; Latent variable; Marginal model; Random effects

At both the individual and societal levels, the health and economic burden of disability in older adults is enormous in developed countries, including the U.S. Recent studies have revealed that the disablement process in older adults often comprises episodic periods of impaired functioning and periods that are relatively free of disability, amid a secular and natural trend of decline in functioning. Rather than an irreversible, progressive event that is analogous to a chronic disease, disability is better conceptualized and mathematically modeled as states that do not necessarily follow a strict linear order of good-to-bad. Statistical tools, including Markov models, which allow bidirectional transition between states, and random effects models, which allow individual-specific rate of secular decline, are pertinent. In this paper, we propose a mixed effects, multivariate, hidden Markov model to handle partially ordered disability states. The model generalizes the continuation ratio model for ordinal data in the generalized linear model literature and provides a formal framework for testing the effects of risk factors and/or an intervention on the transitions between different disability states. Under a generalization of the proportional odds ratio assumption, the proposed model circumvents the problem of a potentially large number of parameters when the number of states and the number of covariates are substantial. We describe a maximum likelihood method for estimating the partially ordered, mixed effects model and show how the model can be applied to a longitudinal data set that consists of N = 2,903 older adults followed for 10 years in the Health Aging and Body Composition Study. We further statistically test the effects of various risk factors upon the probabilities of transition into various severe disability states. The result can be used to inform geriatric and public health science researchers who study the disablement process.

doi:10.1080/01621459.2013.770307

PMCID: PMC3777389
PMID: 24058222

Latent Markov model; continuation ratio model; EM algorithm; generalized linear model; Health ABC study

Objective

To estimate the association between local clean indoor air ordinances and prenatal maternal smoking across 351 municipalities in Massachusetts before the 2004 statewide ban and to test the effect of time since ordinance adoption on the association.

Methods

The authors linked 2002 birth certificate data of women who gave birth in the state and reported a Massachusetts residence (n=67 584) to a database of indoor smoking ordinances in all municipalities. Multilevel regression models accounting for individual- and municipality-level variables estimate the associations between the presence of local smoking ordinances, strength of the ordinances, time since ordinance adoption and prenatal smoking.

Results

Compared with those living in municipalities with no ordinances, women living in municipalities with a smoking ordinance had lower odds of prenatal smoking (OR=0.72, CI=0.53 to 0.98). No effect was found for 100% smoke-free ordinances. For the analyses testing the effect of time, pregnant women living in municipalities with ordinances enacted >2 years were less likely to smoke than those in municipalities with more recent (<1 year) ordinances.

Conclusions

Preventing smoking among women of reproductive age is a public health priority. This study suggests that indoor smoking ordinances were associated with lower prenatal smoking prevalence and the favourable effect increased over time. Findings highlight the public health benefit of tobacco control policies.

doi:10.1136/tobaccocontrol-2011-050157

PMCID: PMC3401240
PMID: 22166267

SUMMARY

In a large, prospective longitudinal study designed to monitor cardiac abnormalities in children born to HIV-infected women, instead of a single outcome variable, there are multiple binary outcomes (e.g., abnormal heart rate, abnormal blood pressure, abnormal heart wall thickness) considered as joint measures of heart function over time. In the presence of missing responses at some time points, longitudinal marginal models for these multiple outcomes can be estimated using generalized estimating equations (GEE) (Liang and Zeger, 1986), and consistent estimates can be obtained under the assumption of a missing completely at random (MCAR) mechanism. When the missing data mechanism is missing at random (MAR), that is the probability of missing a particular outcome at a time-point depends on observed values of that outcome and the remaining outcomes at other time points, we propose joint estimation of the marginal models using a single modified GEE based on an EM-type algorithm. The proposed method is motivated by the longitudinal study of cardiac abnormalities in children born to HIV-infected women and analyses of these data are presented to illustrate the application of the method. Further, in an asymptotic study of bias, we show that under an MAR mechanism in which missingness depends on all observed outcome variables, our joint estimation via the modified GEE produces almost unbiased estimates, provided the correlation model has been correctly specified, whereas estimates from standard GEE can lead to substantial bias.

doi:10.1111/j.1467-985X.2008.00564.x

PMCID: PMC2888330
PMID: 20585409

EM-type algorithm; generalized estimating equations; missing at random; missing completely at random

The Agincourt Health and Demographic Surveillance System has since 2001 conducted a biannual household asset survey in order to quantify household socio-economic status (SES) in a rural population living in northeast South Africa. The survey contains binary, ordinal and nominal items. In the absence of income or expenditure data, the SES landscape in the study population is explored and described by clustering the households into homogeneous groups based on their asset status.

A model-based approach to clustering the Agincourt households, based on latent variable models, is proposed. In the case of modeling binary or ordinal items, item response theory models are employed. For nominal survey items, a factor analysis model, similar in nature to a multinomial probit model, is used. Both model types have an underlying latent variable structure—this similarity is exploited and the models are combined to produce a hybrid model capable of handling mixed data types. Further, a mixture of the hybrid models is considered to provide clustering capabilities within the context of mixed binary, ordinal and nominal response data. The proposed model is termed a mixture of factor analyzers for mixed data (MFA-MD).

The MFA-MD model is applied to the survey data to cluster the Agincourt households into homogeneous groups. The model is estimated within the Bayesian paradigm, using a Markov chain Monte Carlo algorithm. Intuitive groupings result, providing insight to the different socio-economic strata within the Agincourt region.

doi:10.1214/14-AOAS726

PMCID: PMC4256055
PMID: 25485026

Clustering; mixed data; item response theory; Metropolis-within-Gibbs

SUMMARY

Time-to-event data with time-varying covariates pose an interesting challenge for statistical modeling and inference, especially where the data require a regression structure but are not consistent with the proportional hazard assumption. Threshold regression (TR) is a relatively new methodology based on the concept that degradation or deterioration of a subject’s health follows a stochastic process and failure occurs when the process first reaches a failure state or threshold (a first-hitting-time). Survival data with time-varying covariates consist of sequential observations on the level of degradation and/or on covariates of the subject, prior to the occurrence of the failure event. Encounters with this type of data structure abound in practical settings for survival analysis and there is a pressing need for simple regression methods to handle the longitudinal aspect of the data. Using a Markov property to decompose a longitudinal record into a series of single records is one strategy for dealing with this type of data. This study looks at the theoretical conditions for which this Markov approach is valid. The approach is called threshold regression with Markov decomposition or Markov TR for short. A number of important special cases, such as data with unevenly spaced time points and competing risks as stopping modes, are discussed. We show that a proportional hazards regression model with time-varying covariates is consistent with the Markov TR model. The Markov TR procedure is illustrated by a case application to a study of lung cancer risk. The procedure is also shown to be consistent with the use of an alternative time scale. Finally, we present the connection of the procedure to the concept of a collapsible survival model.

doi:10.1002/sim.3808

PMCID: PMC3063107
PMID: 20213704

Competing risks; first hitting time; latent process; longitudinal data; Markov property; stopping time; unevenly spaced time points; Wiener diffusion process

Neuroimaging data are increasingly being used to predict potential outcomes or groupings, such as clinical severity, drug dose response, and transitional illness states. In these examples, the variable (target) we want to predict is ordinal in nature. Conventional classification schemes assume that the targets are nominal and hence ignore their ranked nature, whereas parametric and/or non-parametric regression models enforce a metric notion of distance between classes. Here, we propose a novel, alternative multivariate approach that overcomes these limitations — whole brain probabilistic ordinal regression using a Gaussian process framework. We applied this technique to two data sets of pharmacological neuroimaging data from healthy volunteers. The first study was designed to investigate the effect of ketamine on brain activity and its subsequent modulation with two compounds — lamotrigine and risperidone. The second study investigates the effect of scopolamine on cerebral blood flow and its modulation using donepezil. We compared ordinal regression to multi-class classification schemes and metric regression. Considering the modulation of ketamine with lamotrigine, we found that ordinal regression significantly outperformed multi-class classification and metric regression in terms of accuracy and mean absolute error. However, for risperidone ordinal regression significantly outperformed metric regression but performed similarly to multi-class classification both in terms of accuracy and mean absolute error. For the scopolamine data set, ordinal regression was found to outperform both multi-class and metric regression techniques considering the regional cerebral blood flow in the anterior cingulate cortex. Ordinal regression was thus the only method that performed well in all cases. Our results indicate the potential of an ordinal regression approach for neuroimaging data while providing a fully probabilistic framework with elegant approaches for model selection.

Highlights

•Often in neuroimaging the independent variables are ranked or ordered.•Classification and regression models cannot explicitly model an ordinal target.•We present a novel multivariate ordinal regression approach for neuroimaging data.•Our results show that ordinal regression is a powerful method for ranking data.

doi:10.1016/j.neuroimage.2013.05.036

PMCID: PMC4068378
PMID: 23684876

Multivariate; Ordinal regression; Gaussian processes; Pharmacological MRI; Ketamine; Scopolamine

Analysis of longitudinal ordered categorical efficacy or safety data in clinical trials using mixed models is increasingly performed. However, algorithms available for maximum likelihood estimation using an approximation of the likelihood integral, including LAPLACE approach, may give rise to biased parameter estimates. The SAEM algorithm is an efficient and powerful tool in the analysis of continuous/count mixed models. The aim of this study was to implement and investigate the performance of the SAEM algorithm for longitudinal categorical data. The SAEM algorithm is extended for parameter estimation in ordered categorical mixed models together with an estimation of the Fisher information matrix and the likelihood. We used Monte Carlo simulations using previously published scenarios evaluated with NONMEM. Accuracy and precision in parameter estimation and standard error estimates were assessed in terms of relative bias and root mean square error. This algorithm was illustrated on the simultaneous analysis of pharmacokinetic and discretized efficacy data obtained after a single dose of warfarin in healthy volunteers. The new SAEM algorithm is implemented in MONOLIX 3.1 for discrete mixed models. The analyses show that for parameter estimation, the relative bias is low for both fixed effects and variance components in all models studied. Estimated and empirical standard errors are similar. The warfarin example illustrates how simple and rapid it is to analyze simultaneously continuous and discrete data with MONOLIX 3.1. The SAEM algorithm is extended for analysis of longitudinal categorical data. It provides accurate estimates parameters and standard errors. The estimation is fast and stable.

doi:10.1208/s12248-010-9238-5

PMCID: PMC3032088
PMID: 21063925

categorical data; mixed models; MONOLIX; proportional odds model; SAEM

Longitudinal ordinal data are common in many scientific studies, including those of multiple sclerosis (MS), and are frequently modeled using Markov dependency. Several authors have proposed random-effects Markov models to account for heterogeneity in the population. In this paper, we go one step further and study prediction based on random-effects Markov models. In particular, we show how to calculate the probabilities of future events and confidence intervals for those probabilities, given observed data on the ordinal outcome and a set of covariates, and how to update them over time. We discuss the usefulness of depicting these probabilities for visualization and interpretation of model results and illustrate our method using data from a phase III clinical trial that evaluated the utility of interferon beta-1a (trademark Avonex) to MS patients of type relapsing–remitting.

doi:10.1093/biostatistics/kxn008

PMCID: PMC2536724
PMID: 18424785

Markov model; Ordinal response; Prediction; Transition model