Search tips
Search criteria

Results 1-25 (835111)

Clipboard (0)

Related Articles

1.  Flexible marginalized models for bivariate longitudinal ordinal data 
Biostatistics (Oxford, England)  2013;14(3):462-476.
Random effects models are commonly used to analyze longitudinal categorical data. Marginalized random effects models are a class of models that permit direct estimation of marginal mean parameters and characterize serial correlation for longitudinal categorical data via random effects (Heagerty, 1999). Marginally specified logistic-normal models for longitudinal binary data. Biometrics 55, 688–698; Lee and Daniels, 2008. Marginalized models for longitudinal ordinal data with application to quality of life studies. Statistics in Medicine 27, 4359–4380). In this paper, we propose a Kronecker product (KP) covariance structure to capture the correlation between processes at a given time and the correlation within a process over time (serial correlation) for bivariate longitudinal ordinal data. For the latter, we consider a more general class of models than standard (first-order) autoregressive correlation models, by re-parameterizing the correlation matrix using partial autocorrelations (Daniels and Pourahmadi, 2009). Modeling covariance matrices via partial autocorrelations. Journal of Multivariate Analysis 100, 2352–2363). We assess the reasonableness of the KP structure with a score test. A maximum marginal likelihood estimation method is proposed utilizing a quasi-Newton algorithm with quasi-Monte Carlo integration of the random effects. We examine the effects of demographic factors on metabolic syndrome and C-reactive protein using the proposed models.
PMCID: PMC3677737  PMID: 23365416
Kronecker product; Metabolic syndrome; Partial autocorrelation
2.  Marginalized models for longitudinal ordinal data with application to quality of life studies 
Statistics in medicine  2008;27(21):4359-4380.
Random effects are often used in generalized linear models to explain the serial dependence for longitudinal categorical data. Marginalized random effects models (MREMs) for the analysis of longitudinal binary data have been proposed to permit likelihood-based estimation of marginal regression parameters. In this paper, we introduce an extension of the MREM to accommodate longitudinal ordinal data. Maximum marginal likelihood estimation is implemented utilizing quasi-Newton algorithms with Monte Carlo integration of the random effects. Our approach is applied to analyze the quality of life data from a recent colorectal cancer clinical trial. Dropout occurs at a high rate and is often due to tumor progression or death. To deal with progression/death, we use a mixture model for the joint distribution of longitudinal measures and progression/death times and principal stratification to draw causal inferences about survivors.
PMCID: PMC2858760  PMID: 18613246
marginalized likelihood-based models; ordinal data models; dropout
3.  Ordinal latent variable models and their application in the study of newly licensed teenage drivers 
In a unique longitudinal study of teen driving, risky driving behavior and the occurrence of crashes or near crashes are measured prospectively over the first 18 months of licensure. Of scientific interest is relating the two processes and developing a predictor of crashes from previous risky driving behavior. In this work, we propose two latent class models for relating risky driving behavior to the occurrence of a crash or near crash event. The first approach models the binary longitudinal crash/near crash outcome using a binary latent variable which depends on risky driving covariates and previous outcomes. A random effects model introduces heterogeneity among subjects in modeling the mean value of the latent state. The second approach extends the first model to the ordinal case where the latent state is composed of K ordinal classes. Additionally, we discuss an alternate hidden Markov model formulation. Estimation is performed using the expectation-maximization (EM) algorithm and Monte Carlo EM. We illustrate the importance of using these latent class modeling approaches through the analysis of the teen driving behavior.
PMCID: PMC4183151  PMID: 25284899
driving study; latent class modeling; Monte Carlo EM
4.  On outcome-dependent sampling designs for longitudinal binary response data with time-varying covariates 
Biostatistics (Oxford, England)  2008;9(4):735-749.
A typical longitudinal study prospectively collects both repeated measures of a health status outcome as well as covariates that are used either as the primary predictor of interest or as important adjustment factors. In many situations, all covariates are measured on the entire study cohort. However, in some scenarios the primary covariates are time dependent yet may be ascertained retrospectively after completion of the study. One common example would be covariate measurements based on stored biological specimens such as blood plasma. While authors have previously proposed generalizations of the standard case–control design in which the clustered outcome measurements are used to selectively ascertain covariates (Neuhaus and Jewell, 1990) and therefore provide resource efficient collection of information, these designs do not appear to be commonly used. One potential barrier to the use of longitudinal outcome-dependent sampling designs would be the lack of a flexible class of likelihood-based analysis methods. With the relatively recent development of flexible and practical methods such as generalized linear mixed models (Breslow and Clayton, 1993) and marginalized models for categorical longitudinal data (see Heagerty and Zeger, 2000, for an overview), the class of likelihood-based methods is now sufficiently well developed to capture the major forms of longitudinal correlation found in biomedical repeated measures data. Therefore, the goal of this manuscript is to promote the consideration of outcome-dependent longitudinal sampling designs and to both outline and evaluate the basic conditional likelihood analysis allowing for valid statistical inference.
PMCID: PMC2733177  PMID: 18372397
Binary data; Longitudinal data analysis; Marginal models; Marginalized models; Outcome-dependent sampling; Time-dependent covariates
5.  Bayesian Semiparametric Regression for Longitudinal Binary Processes with Missing Data 
Statistics in medicine  2008;27(17):3247-3268.
Longitudinal studies with binary repeated measures are widespread in biomedical research. Marginal regression approaches for balanced binary data are well developed, while for binary process data, where measurement times are irregular and may differ by individuals, likelihood-based methods for marginal regression analysis are less well developed. In this article, we develop a Bayesian regression model for analyzing longitudinal binary process data, with emphasis on dealing with missingness. We focus on the settings where data are missing at random, which require a correctly specified joint distribution for the repeated measures in order to draw valid likelihood-based inference about the marginal mean. To provide maximum flexibility, the proposed model specifies both the marginal mean and serial dependence structures using nonparametric smooth functions. Serial dependence is allowed to depend on the time lag between adjacent outcomes as well as other relevant covariates. Inference is fully Bayesian. Using simulations, we show that adequate modeling of the serial dependence structure is necessary for valid inference of the marginal mean when the binary process data are missing at random. Longitudinal viral load data from the HIV Epidemiology Research Study (HERS) are analyzed for illustration.
PMCID: PMC2581820  PMID: 18351709
Repeated measures; Marginal model; Nonparametric regression; Penalized splines; HIV/AIDS; Antiviral treatment
6.  Antedependence Models for Nonstationary Categorical Longitudinal Data with Ignorable Missingness: Likelihood-Based Inference 
Statistics in medicine  2013;32(19):10.1002/sim.5763.
Time index-ordered random variables are said to be antedependent (AD) of order (p1, p2, …, pn) if the kth variable, conditioned on the pk immediately preceding variables, is independent of all further preceding variables. Inferential methods associated with AD models are well developed for continuous (primarily normal) longitudinal data, but not for categorical longitudinal data. In this article, we develop likelihood-based inferential procedures for unstructured AD models for categorical longitudinal data. Specifically, we derive maximum likelihood estimators (mles) of model parameters; penalized likelihood criteria and likelihood ratio tests for determining the order of antedependence; and likelihood ratio tests for homogeneity across groups, time-invariance of transition probabilities, and strict stationarity. Closed-form expressions for mles and test statistics, which allow for the possibility of empty cells and monotone missing data, are given for all cases save strict stationarity. For data with an arbitrary missingness pattern, we derive an efficient restricted EM algorithm for obtaining mles. The performance of the tests is evaluated by simulation. The methods are applied to longitudinal studies of toenail infection severity (measured on a binary scale) and Alzheimer’s disease severity (measured on an ordinal scale). The analysis of the toenail infection severity data reveals interesting nonstationary behavior of the transition probabilities and indicates that an unstructured first-order AD model is superior to stationary and other structured first-order AD models that have previously been fit to these data. The analysis of the Alzheimer’s severity data indicates that the antedependence is second-order with time-invariant transition probabilities, suggesting the use of a second-order autoregressive cumulative logit model.
PMCID: PMC3885186  PMID: 23436682
Likelihood ratio test; Markov models; Missing data; Transition models
7.  Prediction of transplant-free survival in idiopathic pulmonary fibrosis patients using joint models for event times and mixed multivariate longitudinal data 
Journal of applied statistics  2014;41(10):2192-2205.
We implement a joint model for mixed multivariate longitudinal measurements, applied to the prediction of time until lung transplant or death in idiopathic pulmonary fibrosis. Specifically, we formulate a unified Bayesian joint model for the mixed longitudinal responses and time-to-event outcomes. For the longitudinal model of continuous and binary responses, we investigate multivariate generalized linear mixed models using shared random effects. Longitudinal and time-to-event data are assumed to be independent conditional on available covariates and shared parameters. A Markov chain Monte Carlo (MCMC) algorithm, implemented in OpenBUGS, is used for parameter estimation. To illustrate practical considerations in choosing a final model, we fit 37 different candidate models using all possible combinations of random effects and employ a Deviance Information Criterion (DIC) to select a best fitting model. We demonstrate the prediction of future event probabilities within a fixed time interval for patients utilizing baseline data, post-baseline longitudinal responses, and the time-to-event outcome. The performance of our joint model is also evaluated in simulation studies.
PMCID: PMC4157686  PMID: 25214700
Idiopathic Pulmonary Fibrosis; Joint model; Mixed continuous and binary data; Multivariate longitudinal data; Prediction model; Shared parameter model; Survival analysis
8.  Joint modeling of longitudinal ordinal data and competing risks survival times and analysis of the NINDS rt-PA stroke trial 
Statistics in medicine  2010;29(5):546-557.
Existing joint models for longitudinal and survival data are not applicable for longitudinal ordinal outcomes with possible non-ignorable missing values caused by multiple reasons. We propose a joint model for longitudinal ordinal measurements and competing risks failure time data, in which a partial proportional odds model for the longitudinal ordinal outcome is linked to the event times by latent random variables. At the survival endpoint, our model adopts the competing risks framework to model multiple failure types at the same time. The partial proportional odds model, as an extension of the popular proportional odds model for ordinal outcomes, is more flexible and at the same time provides a tool to test the proportional odds assumption. We use a likelihood approach and derive an EM algorithm to obtain the maximum likelihood estimates of the parameters. We further show that all the parameters at the survival endpoint are identifiable from the data. Our joint model enables one to make inference for both the longitudinal ordinal outcome and the failure times simultaneously. In addition, the inference at the longitudinal endpoint is adjusted for possible non-ignorable missing data caused by the failure times. We apply the method to the NINDS rt-PA stroke trial. Our study considers the modified Rankin Scale only. Other ordinal outcomes in the trial, such as the Barthel and Glasgow scales can be treated in the same way.
PMCID: PMC2822130  PMID: 19943331
9.  A marginalized conditional linear model for longitudinal binary data when informative dropout occurs in continuous time 
Biostatistics (Oxford, England)  2011;13(2):355-368.
Within the pattern-mixture modeling framework for informative dropout, conditional linear models (CLMs) are a useful approach to deal with dropout that can occur at any point in continuous time (not just at observation times). However, in contrast with selection models, inferences about marginal covariate effects in CLMs are not readily available if nonidentity links are used in the mean structures. In this article, we propose a CLM for long series of longitudinal binary data with marginal covariate effects directly specified. The association between the binary responses and the dropout time is taken into account by modeling the conditional mean of the binary response as well as the dependence between the binary responses given the dropout time. Specifically, parameters in both the conditional mean and dependence models are assumed to be linear or quadratic functions of the dropout time; and the continuous dropout time distribution is left completely unspecified. Inference is fully Bayesian. We illustrate the proposed model using data from a longitudinal study of depression in HIV-infected women, where the strategy of sensitivity analysis based on the extrapolation method is also demonstrated.
PMCID: PMC3297830  PMID: 22133756
Bayesian analysis; HIV/AIDS; Marginal model; Missing data; Sensitivity analysis
10.  Non-parametric estimation of a time-dependent predictive accuracy curve 
A major biomedical goal associated with evaluating a candidate biomarker or developing a predictive model score for event-time outcomes is to accurately distinguish between incident cases from the controls surviving beyond t throughout the entire study period. Extensions of standard binary classification measures like time-dependent sensitivity, specificity, and receiver operating characteristic (ROC) curves have been developed in this context (Heagerty, P. J., and others, 2000. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 56, 337–344). We propose a direct, non-parametric method to estimate the time-dependent Area under the curve (AUC) which we refer to as the weighted mean rank (WMR) estimator. The proposed estimator performs well relative to the semi-parametric AUC curve estimator of Heagerty and Zheng (2005. Survival model predictive accuracy and ROC curves. Biometrics 61, 92–105). We establish the asymptotic properties of the proposed estimator and show that the accuracy of markers can be compared very simply using the difference in the WMR statistics. Estimators of pointwise standard errors are provided.
PMCID: PMC3520498  PMID: 22734044
AUC curve; Survival analysis; Time-dependent ROC
11.  A mixed ordinal location scale model for analysis of Ecological Momentary Assessment (EMA) data* 
Statistics and its interface  2009;2(4):391-401.
Mixed-effects logistic regression models are described for analysis of longitudinal ordinal outcomes, where observations are observed clustered within subjects. Random effects are included in the model to account for the correlation of the clustered observations. Typically, the error variance and the variance of the random effects are considered to be homogeneous. These variance terms characterize the within-subjects (i.e., error variance) and between-subjects (i.e., random-effects variance) variation in the data. In this article, we describe how covariates can influence these variances, and also extend the standard logistic mixed model by adding a subject-level random effect to the within-subject variance specification. This permits subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their responses. Additionally, we allow the random effects to be correlated. We illustrate application of these models for ordinal data using Ecological Momentary Assessment (EMA) data, or intensive longitudinal data, from an adolescent smoking study. These mixed-effects ordinal location scale models have useful applications in mental health research where outcomes are often ordinal and there is interest in subject heterogeneity, both between- and within-subjects.
PMCID: PMC2847414  PMID: 20357914
Complex variation; Mood variation; Heterogeneity; Variance modeling
12.  Describing the longitudinal course of major depression using Markov models: Data integration across three national surveys 
Most epidemiological studies of major depression report period prevalence estimates. These are of limited utility in characterizing the longitudinal epidemiology of this condition. Markov models provide a methodological framework for increasing the utility of epidemiological data. Markov models relating incidence and recovery to major depression prevalence have been described in a series of prior papers. In this paper, the models are extended to describe the longitudinal course of the disorder.
Data from three national surveys conducted by the Canadian national statistical agency (Statistics Canada) were used in this analysis. These data were integrated using a Markov model. Incidence, recurrence and recovery were represented as weekly transition probabilities. Model parameters were calibrated to the survey estimates.
The population was divided into three categories: low, moderate and high recurrence groups. The size of each category was approximated using lifetime data from a study using the WHO Mental Health Composite International Diagnostic Interview (WMH-CIDI). Consistent with previous work, transition probabilities reflecting recovery were high in the initial weeks of the episodes, and declined by a fixed proportion with each passing week.
Markov models provide a framework for integrating psychiatric epidemiological data. Previous studies have illustrated the utility of Markov models for decomposing prevalence into its various determinants: incidence, recovery and mortality. This study extends the Markov approach by distinguishing several recurrence categories.
PMCID: PMC1298330  PMID: 16288648
Depressive Disorder; Epidemiologic Methods; Markov Chain
13.  Evaluating Prognostic Accuracy of Biomarkers under Competing Risk 
Biometrics  2011;68(2):388-396.
To develop more targeted intervention strategies, an important research goal is to identify markers predictive of clinical events. A crucial step towards this goal is to characterize the clinical performance of a marker for predicting different types of events. In this manuscript, we present statistical methods for evaluating the performance of a prognostic marker in predicting multiple competing events. To capture the potential time-varying predictive performance of the marker and incorporate competing risks, we define time- and cause-specific accuracy summaries by stratifying cases based on causes of failure. Such definition would allow one to evaluate the predictive accuracy of a marker for each type of event and compare its predictiveness across event types. Extending the nonparametric crude cause-specific ROC curve estimators by Saha and Heagerty (2010), we develop inference procedures for a range of cause-specific accuracy summaries. To estimate the accuracy measures and assess how covariates may affect the accuracy of a marker under the competing risk setting, we consider two forms of semiparametric models through the cause-specific hazard framework. These approaches enable a flexible modeling of the relationships between the marker and failure times for each cause, while efficiently accommodating additional covariates. We investigate the asymptotic property of the proposed accuracy estimators and demonstrate the finite sample performance of these estimators through simulation studies. The proposed procedures are illustrated with data from a prostate cancer prognostic study.
PMCID: PMC3694786  PMID: 22150576
Biomarker evaluation; Cause-specific Hazard; Competing risk; Negative predictive value; Positive predictive value; Receiver Operating Characteristics Curve (ROC curve); Survival analysis
14.  Visualising and modelling changes in categorical variables in longitudinal studies 
Graphical techniques can provide visually compelling insights into complex data patterns. In this paper we present a type of lasagne plot showing changes in categorical variables for participants measured at regular intervals over time and propose statistical models to estimate distributions of marginal and transitional probabilities.
The plot uses stacked bars to show the distribution of categorical variables at each time interval, with different colours to depict different categories and changes in colours showing trajectories of participants over time. The models are based on nominal logistic regression which is appropriate for both ordinal and nominal categorical variables. To illustrate the plots and models we analyse data on smoking status, body mass index (BMI) and physical activity level from a longitudinal study on women’s health. To estimate marginal distributions we fit survey wave as an explanatory variable whereas for transitional distributions we fit status of participants (e.g. smoking status) at previous surveys.
For the illustrative data the marginal models showed BMI increasing, physical activity decreasing and smoking decreasing linearly over time at the population level. The plots and transition models showed smoking status to be highly predictable for individuals whereas BMI was only moderately predictable and physical activity was virtually unpredictable. Most of the predictive power was obtained from participant status at the previous survey. Predicted probabilities from the models mostly agreed with observed probabilities indicating adequate goodness-of-fit.
The proposed form of lasagne plot provides a simple visual aid to show transitions in categorical variables over time in longitudinal studies. The suggested models complement the plot and allow formal testing and estimation of marginal and transitional distributions. These simple tools can provide valuable insights into categorical data on individuals measured at regular intervals over time.
PMCID: PMC3938907  PMID: 24576041
Categorical variables; Graphical methods; Longitudinal studies; Marginal distribution; Nominal regression; Transition probabilities
15.  A Joint Model for Longitudinal Measurements and Survival Data in the Presence of Multiple Failure Types 
Biometrics  2007;64(3):762-771.
In this article we study a joint model for longitudinal measurements and competing risks survival data. Our joint model provides a flexible approach to handle possible nonignorable missing data in the longitudinal measurements due to dropout. It is also an extension of previous joint models with a single failure type, offering a possible way to model informatively censored events as a competing risk. Our model consists of a linear mixed effects submodel for the longitudinal outcome and a proportional cause-specific hazards frailty submodel (Prentice et al., 1978, Biometrics 34, 541-554) for the competing risks survival data, linked together by some latent random effects. We propose to obtain the maximum likelihood estimates of the parameters by an expectation maximization (EM) algorithm and estimate their standard errors using a profile likelihood method. The developed method works well in our simulation studies and is applied to a clinical trial for the scleroderma lung disease.
PMCID: PMC2751647  PMID: 18162112
Cause-specific hazard; Competing risks; EM algorithm; Joint modeling; Longitudinal data; Mixed effects model
16.  Nonparametric Estimation of a Recurrent Survival Function 
Recurrent event data are frequently encountered in studies with longitudinal designs. Let the recurrence time be the time between two successive recurrent events. Recurrence times can be treated as a type of correlated survival data in statistical analysis. In general, because of the ordinal nature of recurrence times, statistical methods that are appropriate for standard correlated survival data in marginal models may not be applicable to recurrence time data. Specifically, for estimating the marginal survival function, the Kaplan-Meier estimator derived from the pooled recurrence times serves as a consistent estimator for standard correlated survival data but not for recurrence time data. In this article we consider the problem of how to estimate the marginal survival function in nonparametric models. A class of nonparametric estimators is introduced. The appropriateness of the estimators is confirmed by statistical theory and simulations. Simulation and analysis from schizophrenia data are presented to illustrate the estimators' performance.
PMCID: PMC3826567  PMID: 24244058
Correlated survival data; Frailty; Kaplan-Meier estimate; Longitudinal designs; Recurrent event
17.  A General Class of Pattern Mixture Models for Nonignorable Dropout with Many Possible Dropout Times 
Biometrics  2007;64(2):538-545.
In this article we consider the problem of fitting pattern mixture models to longitudinal data when there are many unique dropout times. We propose a marginally specified latent class pattern mixture model. The marginal mean is assumed to follow a generalized linear model, whereas the mean conditional on the latent class and random effects is specified separately. Because the dimension of the parameter vector of interest (the marginal regression coefficients) does not depend on the assumed number of latent classes, we propose to treat the number of latent classes as a random variable. We specify a prior distribution for the number of classes, and calculate (approximate) posterior model probabilities. In order to avoid the complications with implementing a fully Bayesian model, we propose a simple approximation to these posterior probabilities. The ideas are illustrated using data from a longitudinal study of depression in HIV-infected women.
PMCID: PMC2791415  PMID: 17900312
Bayesian model averaging; Incomplete data; Latent variable; Marginal model; Random effects
18.  Partially ordered mixed hidden Markov model for the disablement process of older adults 
At both the individual and societal levels, the health and economic burden of disability in older adults is enormous in developed countries, including the U.S. Recent studies have revealed that the disablement process in older adults often comprises episodic periods of impaired functioning and periods that are relatively free of disability, amid a secular and natural trend of decline in functioning. Rather than an irreversible, progressive event that is analogous to a chronic disease, disability is better conceptualized and mathematically modeled as states that do not necessarily follow a strict linear order of good-to-bad. Statistical tools, including Markov models, which allow bidirectional transition between states, and random effects models, which allow individual-specific rate of secular decline, are pertinent. In this paper, we propose a mixed effects, multivariate, hidden Markov model to handle partially ordered disability states. The model generalizes the continuation ratio model for ordinal data in the generalized linear model literature and provides a formal framework for testing the effects of risk factors and/or an intervention on the transitions between different disability states. Under a generalization of the proportional odds ratio assumption, the proposed model circumvents the problem of a potentially large number of parameters when the number of states and the number of covariates are substantial. We describe a maximum likelihood method for estimating the partially ordered, mixed effects model and show how the model can be applied to a longitudinal data set that consists of N = 2,903 older adults followed for 10 years in the Health Aging and Body Composition Study. We further statistically test the effects of various risk factors upon the probabilities of transition into various severe disability states. The result can be used to inform geriatric and public health science researchers who study the disablement process.
PMCID: PMC3777389  PMID: 24058222
Latent Markov model; continuation ratio model; EM algorithm; generalized linear model; Health ABC study
19.  Association between local indoor smoking ordinances in Massachusetts and cigarette smoking during pregnancy: a multilevel analysis 
Tobacco control  2011;22(3):184-189.
To estimate the association between local clean indoor air ordinances and prenatal maternal smoking across 351 municipalities in Massachusetts before the 2004 statewide ban and to test the effect of time since ordinance adoption on the association.
The authors linked 2002 birth certificate data of women who gave birth in the state and reported a Massachusetts residence (n=67 584) to a database of indoor smoking ordinances in all municipalities. Multilevel regression models accounting for individual- and municipality-level variables estimate the associations between the presence of local smoking ordinances, strength of the ordinances, time since ordinance adoption and prenatal smoking.
Compared with those living in municipalities with no ordinances, women living in municipalities with a smoking ordinance had lower odds of prenatal smoking (OR=0.72, CI=0.53 to 0.98). No effect was found for 100% smoke-free ordinances. For the analyses testing the effect of time, pregnant women living in municipalities with ordinances enacted >2 years were less likely to smoke than those in municipalities with more recent (<1 year) ordinances.
Preventing smoking among women of reproductive age is a public health priority. This study suggests that indoor smoking ordinances were associated with lower prenatal smoking prevalence and the favourable effect increased over time. Findings highlight the public health benefit of tobacco control policies.
PMCID: PMC3401240  PMID: 22166267
20.  Joint generalized estimating equations for multivariate longitudinal binary outcomes with missing data: An application to AIDS data 
In a large, prospective longitudinal study designed to monitor cardiac abnormalities in children born to HIV-infected women, instead of a single outcome variable, there are multiple binary outcomes (e.g., abnormal heart rate, abnormal blood pressure, abnormal heart wall thickness) considered as joint measures of heart function over time. In the presence of missing responses at some time points, longitudinal marginal models for these multiple outcomes can be estimated using generalized estimating equations (GEE) (Liang and Zeger, 1986), and consistent estimates can be obtained under the assumption of a missing completely at random (MCAR) mechanism. When the missing data mechanism is missing at random (MAR), that is the probability of missing a particular outcome at a time-point depends on observed values of that outcome and the remaining outcomes at other time points, we propose joint estimation of the marginal models using a single modified GEE based on an EM-type algorithm. The proposed method is motivated by the longitudinal study of cardiac abnormalities in children born to HIV-infected women and analyses of these data are presented to illustrate the application of the method. Further, in an asymptotic study of bias, we show that under an MAR mechanism in which missingness depends on all observed outcome variables, our joint estimation via the modified GEE produces almost unbiased estimates, provided the correlation model has been correctly specified, whereas estimates from standard GEE can lead to substantial bias.
PMCID: PMC2888330  PMID: 20585409
EM-type algorithm; generalized estimating equations; missing at random; missing completely at random
The annals of applied statistics  2014;8(2):747-776.
The Agincourt Health and Demographic Surveillance System has since 2001 conducted a biannual household asset survey in order to quantify household socio-economic status (SES) in a rural population living in northeast South Africa. The survey contains binary, ordinal and nominal items. In the absence of income or expenditure data, the SES landscape in the study population is explored and described by clustering the households into homogeneous groups based on their asset status.
A model-based approach to clustering the Agincourt households, based on latent variable models, is proposed. In the case of modeling binary or ordinal items, item response theory models are employed. For nominal survey items, a factor analysis model, similar in nature to a multinomial probit model, is used. Both model types have an underlying latent variable structure—this similarity is exploited and the models are combined to produce a hybrid model capable of handling mixed data types. Further, a mixture of the hybrid models is considered to provide clustering capabilities within the context of mixed binary, ordinal and nominal response data. The proposed model is termed a mixture of factor analyzers for mixed data (MFA-MD).
The MFA-MD model is applied to the survey data to cluster the Agincourt households into homogeneous groups. The model is estimated within the Bayesian paradigm, using a Markov chain Monte Carlo algorithm. Intuitive groupings result, providing insight to the different socio-economic strata within the Agincourt region.
PMCID: PMC4256055  PMID: 25485026
Clustering; mixed data; item response theory; Metropolis-within-Gibbs
22.  Threshold Regression for Survival Data with Time-varying Covariates 
Statistics in medicine  2010;29(7-8):896-905.
Time-to-event data with time-varying covariates pose an interesting challenge for statistical modeling and inference, especially where the data require a regression structure but are not consistent with the proportional hazard assumption. Threshold regression (TR) is a relatively new methodology based on the concept that degradation or deterioration of a subject’s health follows a stochastic process and failure occurs when the process first reaches a failure state or threshold (a first-hitting-time). Survival data with time-varying covariates consist of sequential observations on the level of degradation and/or on covariates of the subject, prior to the occurrence of the failure event. Encounters with this type of data structure abound in practical settings for survival analysis and there is a pressing need for simple regression methods to handle the longitudinal aspect of the data. Using a Markov property to decompose a longitudinal record into a series of single records is one strategy for dealing with this type of data. This study looks at the theoretical conditions for which this Markov approach is valid. The approach is called threshold regression with Markov decomposition or Markov TR for short. A number of important special cases, such as data with unevenly spaced time points and competing risks as stopping modes, are discussed. We show that a proportional hazards regression model with time-varying covariates is consistent with the Markov TR model. The Markov TR procedure is illustrated by a case application to a study of lung cancer risk. The procedure is also shown to be consistent with the use of an alternative time scale. Finally, we present the connection of the procedure to the concept of a collapsible survival model.
PMCID: PMC3063107  PMID: 20213704
Competing risks; first hitting time; latent process; longitudinal data; Markov property; stopping time; unevenly spaced time points; Wiener diffusion process
23.  Multivariate decoding of brain images using ordinal regression☆ 
Neuroimage  2013;81(100):347-357.
Neuroimaging data are increasingly being used to predict potential outcomes or groupings, such as clinical severity, drug dose response, and transitional illness states. In these examples, the variable (target) we want to predict is ordinal in nature. Conventional classification schemes assume that the targets are nominal and hence ignore their ranked nature, whereas parametric and/or non-parametric regression models enforce a metric notion of distance between classes. Here, we propose a novel, alternative multivariate approach that overcomes these limitations — whole brain probabilistic ordinal regression using a Gaussian process framework. We applied this technique to two data sets of pharmacological neuroimaging data from healthy volunteers. The first study was designed to investigate the effect of ketamine on brain activity and its subsequent modulation with two compounds — lamotrigine and risperidone. The second study investigates the effect of scopolamine on cerebral blood flow and its modulation using donepezil. We compared ordinal regression to multi-class classification schemes and metric regression. Considering the modulation of ketamine with lamotrigine, we found that ordinal regression significantly outperformed multi-class classification and metric regression in terms of accuracy and mean absolute error. However, for risperidone ordinal regression significantly outperformed metric regression but performed similarly to multi-class classification both in terms of accuracy and mean absolute error. For the scopolamine data set, ordinal regression was found to outperform both multi-class and metric regression techniques considering the regional cerebral blood flow in the anterior cingulate cortex. Ordinal regression was thus the only method that performed well in all cases. Our results indicate the potential of an ordinal regression approach for neuroimaging data while providing a fully probabilistic framework with elegant approaches for model selection.
•Often in neuroimaging the independent variables are ranked or ordered.•Classification and regression models cannot explicitly model an ordinal target.•We present a novel multivariate ordinal regression approach for neuroimaging data.•Our results show that ordinal regression is a powerful method for ranking data.
PMCID: PMC4068378  PMID: 23684876
Multivariate; Ordinal regression; Gaussian processes; Pharmacological MRI; Ketamine; Scopolamine
24.  Implementation and Evaluation of the SAEM Algorithm for Longitudinal Ordered Categorical Data with an Illustration in Pharmacokinetics–Pharmacodynamics 
The AAPS Journal  2010;13(1):44-53.
Analysis of longitudinal ordered categorical efficacy or safety data in clinical trials using mixed models is increasingly performed. However, algorithms available for maximum likelihood estimation using an approximation of the likelihood integral, including LAPLACE approach, may give rise to biased parameter estimates. The SAEM algorithm is an efficient and powerful tool in the analysis of continuous/count mixed models. The aim of this study was to implement and investigate the performance of the SAEM algorithm for longitudinal categorical data. The SAEM algorithm is extended for parameter estimation in ordered categorical mixed models together with an estimation of the Fisher information matrix and the likelihood. We used Monte Carlo simulations using previously published scenarios evaluated with NONMEM. Accuracy and precision in parameter estimation and standard error estimates were assessed in terms of relative bias and root mean square error. This algorithm was illustrated on the simultaneous analysis of pharmacokinetic and discretized efficacy data obtained after a single dose of warfarin in healthy volunteers. The new SAEM algorithm is implemented in MONOLIX 3.1 for discrete mixed models. The analyses show that for parameter estimation, the relative bias is low for both fixed effects and variance components in all models studied. Estimated and empirical standard errors are similar. The warfarin example illustrates how simple and rapid it is to analyze simultaneously continuous and discrete data with MONOLIX 3.1. The SAEM algorithm is extended for analysis of longitudinal categorical data. It provides accurate estimates parameters and standard errors. The estimation is fast and stable.
PMCID: PMC3032088  PMID: 21063925
categorical data; mixed models; MONOLIX; proportional odds model; SAEM
25.  Estimating time-to-event from longitudinal ordinal data using random-effects Markov models: application to multiple sclerosis progression 
Biostatistics (Oxford, England)  2008;9(4):750-764.
Longitudinal ordinal data are common in many scientific studies, including those of multiple sclerosis (MS), and are frequently modeled using Markov dependency. Several authors have proposed random-effects Markov models to account for heterogeneity in the population. In this paper, we go one step further and study prediction based on random-effects Markov models. In particular, we show how to calculate the probabilities of future events and confidence intervals for those probabilities, given observed data on the ordinal outcome and a set of covariates, and how to update them over time. We discuss the usefulness of depicting these probabilities for visualization and interpretation of model results and illustrate our method using data from a phase III clinical trial that evaluated the utility of interferon beta-1a (trademark Avonex) to MS patients of type relapsing–remitting.
PMCID: PMC2536724  PMID: 18424785
Markov model; Ordinal response; Prediction; Transition model

Results 1-25 (835111)